Skip to main content

2015 | OriginalPaper | Buchkapitel

Indoor Objects and Outdoor Urban Scenes Recognition by 3D Visual Primitives

verfasst von : Junsheng Fu, Joni-Kristian Kämäräinen, Anders Glent Buch, Norbert Krüger

Erschienen in: Computer Vision - ACCV 2014 Workshops

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Object detection, recognition and pose estimation in 3D images have gained momentum due to availability of 3D sensors (RGB-D) and increase of large scale 3D data, such as city maps. The most popular approach is to extract and match 3D shape descriptors that encode local scene structure, but omits visual appearance. Visual appearance can be problematic due to imaging distortions, but the assumption that local shape structures are sufficient to recognise objects and scenes is largely invalid in practise since objects may have similar shape, but different texture (e.g., grocery packages). In this work, we propose an alternative appearance-driven approach which first extracts 2D primitives justified by Marr’s primal sketch, which are “accumulated” over multiple views and the most stable ones are “promoted” to 3D visual primitives. The 3D promoted primitives represent both structure and appearance. For recognition, we propose a fast and effective correspondence matching using random sampling. For quantitative evaluation we construct a semi-synthetic benchmark dataset using a public 3D model dataset of 119 kitchen objects and another benchmark of challenging street-view images from 4 different cities. In the experiments, our method utilises only a stereo view for training. As the result, with the kitchen objects dataset our method achieved almost perfect recognition rate for \(\pm 10^\circ \) camera view point change and nearly 80 % for \(\pm 20^\circ \), and for the street-view benchmarks it achieved 75 % accuracy for 160 street-view images pairs, 80 % for 96 street-view images pairs, and 92 % for 48 street-view image pairs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007) Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)
2.
Zurück zum Zitat Chum, O., Matas, J.: Unsupervised discovery of co-occurrence in sparse high dimensional data. In: CVPR (2010) Chum, O., Matas, J.: Unsupervised discovery of co-occurrence in sparse high dimensional data. In: CVPR (2010)
3.
Zurück zum Zitat Rodola, E., Albarelli, A., Bergamasco, F., Torsello, A.: A scale independent selection process for 3d object recognition in cluttered scenes. Int. J. Comput. Vis. 102, 129–145 (2013)CrossRefMathSciNet Rodola, E., Albarelli, A., Bergamasco, F., Torsello, A.: A scale independent selection process for 3d object recognition in cluttered scenes. Int. J. Comput. Vis. 102, 129–145 (2013)CrossRefMathSciNet
4.
Zurück zum Zitat As’ari, M., Supriyanto, U.S.E.: 3d shape descriptor for object recognition based on kinect-like depth image. Image Vis. Comput. 32, 260–269 (2014)CrossRef As’ari, M., Supriyanto, U.S.E.: 3d shape descriptor for object recognition based on kinect-like depth image. Image Vis. Comput. 32, 260–269 (2014)CrossRef
5.
Zurück zum Zitat Buch, A., Yang, Y., Krüger, N., Petersen, H.: In search of inliers: 3d correspondence by local and global voting. In: CVPR (2014) Buch, A., Yang, Y., Krüger, N., Petersen, H.: In search of inliers: 3d correspondence by local and global voting. In: CVPR (2014)
6.
Zurück zum Zitat Marr, D.: Vision. A Computational Investigation into the Human Representation and Processing of Visual Information. W.H. Freeman and Company, New York (1982) Marr, D.: Vision. A Computational Investigation into the Human Representation and Processing of Visual Information. W.H. Freeman and Company, New York (1982)
7.
Zurück zum Zitat Kalkan, S., Wörgötter, F., Krüger, N.: Statistical analysis of local 3d structure in 2d images. In: CVPR (2006) Kalkan, S., Wörgötter, F., Krüger, N.: Statistical analysis of local 3d structure in 2d images. In: CVPR (2006)
8.
Zurück zum Zitat Glasner, D., Galun, M., Alpert, S., Basri, R., Shakhnarovich, G.: Viewpoint-aware object detection and pose estimation. In: ICCV (2011) Glasner, D., Galun, M., Alpert, S., Basri, R., Shakhnarovich, G.: Viewpoint-aware object detection and pose estimation. In: ICCV (2011)
9.
Zurück zum Zitat Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2d-to-3d matching. In: ICCV (2011) Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2d-to-3d matching. In: ICCV (2011)
10.
Zurück zum Zitat Zia, M., Stark, M., Schiele, B., Schindler, K.: Detailed 3d representations for object recognition and modeling. IEEE PAMI 35, 2608–2623 (2013)CrossRef Zia, M., Stark, M., Schiele, B., Schindler, K.: Detailed 3d representations for object recognition and modeling. IEEE PAMI 35, 2608–2623 (2013)CrossRef
11.
Zurück zum Zitat Dorai, C., Jain, A.: Shape spectrum based view grouping and matching of 3D free-form objects. T-PAMI 19, 1139–1145 (1997)CrossRef Dorai, C., Jain, A.: Shape spectrum based view grouping and matching of 3D free-form objects. T-PAMI 19, 1139–1145 (1997)CrossRef
12.
Zurück zum Zitat Fayad, J., Russell, C., Agapito, L.: Automated articulated structure and 3D shape recovery from point correspondences. In: ICCV (2011) Fayad, J., Russell, C., Agapito, L.: Automated articulated structure and 3D shape recovery from point correspondences. In: ICCV (2011)
13.
Zurück zum Zitat Sharma, A., Horaud, R., Cech, J., Boyer, E.: Topologically-robust 3D shape matching based on diffusion geometry and seed growing. In: CVPR (2011) Sharma, A., Horaud, R., Cech, J., Boyer, E.: Topologically-robust 3D shape matching based on diffusion geometry and seed growing. In: CVPR (2011)
14.
Zurück zum Zitat Bronstein, A., Bronstein, M., Kimmel, R.: Three-dimensional face recognition. Int. J. Comput. Vis. 64, 5–30 (2005)CrossRef Bronstein, A., Bronstein, M., Kimmel, R.: Three-dimensional face recognition. Int. J. Comput. Vis. 64, 5–30 (2005)CrossRef
15.
Zurück zum Zitat Gökberg, B., Irfanoglu, M., Akarun, L.: 3D shape-based face representation and feature extraction for face recognition. Image Vis. Comput. 24, 857–869 (2006)CrossRef Gökberg, B., Irfanoglu, M., Akarun, L.: 3D shape-based face representation and feature extraction for face recognition. Image Vis. Comput. 24, 857–869 (2006)CrossRef
16.
Zurück zum Zitat Papazov, C., Burschka, D.: An efficient RANSAC for 3D object recognition in noisy and occluded scenes. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part I. LNCS, vol. 6492, pp. 135–148. Springer, Heidelberg (2011) CrossRef Papazov, C., Burschka, D.: An efficient RANSAC for 3D object recognition in noisy and occluded scenes. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part I. LNCS, vol. 6492, pp. 135–148. Springer, Heidelberg (2011) CrossRef
17.
Zurück zum Zitat Drost, B., Ulrich, M., Navab, N., Ilic, S.: Model globally, match locally: Efficient and robust 3D object recognition. In: CVPR (2010) Drost, B., Ulrich, M., Navab, N., Ilic, S.: Model globally, match locally: Efficient and robust 3D object recognition. In: CVPR (2010)
18.
Zurück zum Zitat Detry, R., Pugeault, N., Piater, J.: A probabilistic framework for 3D visual object representation. T-PAMI 31, 1790–1803 (2009)CrossRef Detry, R., Pugeault, N., Piater, J.: A probabilistic framework for 3D visual object representation. T-PAMI 31, 1790–1803 (2009)CrossRef
19.
Zurück zum Zitat Baseski, E., Pugeault, N., Kalkan, S., Kraft, D., Wörgötter, F., Krüger, N.: A scene representation based on multi-modal 2d and 3d features. In: ICCV Workshop on 3D Representation for Recognition (2007) Baseski, E., Pugeault, N., Kalkan, S., Kraft, D., Wörgötter, F., Krüger, N.: A scene representation based on multi-modal 2d and 3d features. In: ICCV Workshop on 3D Representation for Recognition (2007)
20.
Zurück zum Zitat Knopp, J., Prasad, M., Willems, G., Timofte, R., Van Gool, L.: Hough transform and 3D SURF for robust three dimensional classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 589–602. Springer, Heidelberg (2010) CrossRef Knopp, J., Prasad, M., Willems, G., Timofte, R., Van Gool, L.: Hough transform and 3D SURF for robust three dimensional classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 589–602. Springer, Heidelberg (2010) CrossRef
21.
Zurück zum Zitat Tombari, F., Salti, S., Di Stefano, L.: Unique signatures of histograms for local surface description. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 356–369. Springer, Heidelberg (2010) CrossRef Tombari, F., Salti, S., Di Stefano, L.: Unique signatures of histograms for local surface description. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 356–369. Springer, Heidelberg (2010) CrossRef
22.
Zurück zum Zitat Pham, M.T., Woodford, O., Perbert, F., Maki, A., Stenger, B., Cipolla, R.: A new distance for scale-invariant 3D shape recognition and registration. In: ICCV (2011) Pham, M.T., Woodford, O., Perbert, F., Maki, A., Stenger, B., Cipolla, R.: A new distance for scale-invariant 3D shape recognition and registration. In: ICCV (2011)
23.
Zurück zum Zitat Zaharescu, A., Boyer, E., Horaud, R.: Keypoints and local descriptors of scalar functions on 2d manifolds. Int. J. Comput. Vis. 100, 78–98 (2012)CrossRefMATH Zaharescu, A., Boyer, E., Horaud, R.: Keypoints and local descriptors of scalar functions on 2d manifolds. Int. J. Comput. Vis. 100, 78–98 (2012)CrossRefMATH
24.
Zurück zum Zitat Bo, L., Lai, K., Ren, X., Fox, D.: Object recognition with hierarchical kernel descriptors. In: CVPR (2011) Bo, L., Lai, K., Ren, X., Fox, D.: Object recognition with hierarchical kernel descriptors. In: CVPR (2011)
25.
Zurück zum Zitat Sun, J., Ovsjanikov, M., Guibas, L.: A concise and provably informative multi-scale signature based on heat diffusion. In: Eurographics Symposium on Geometry Processing (2009) Sun, J., Ovsjanikov, M., Guibas, L.: A concise and provably informative multi-scale signature based on heat diffusion. In: Eurographics Symposium on Geometry Processing (2009)
26.
Zurück zum Zitat Bronstein, A., Bronstein, M., Guibas, L., Ovsjanikov, M.: Shape google: geometric words and expressions for invariant shape retrieval. ACM Trans. Graph. 30, 1–20 (2011)CrossRef Bronstein, A., Bronstein, M., Guibas, L., Ovsjanikov, M.: Shape google: geometric words and expressions for invariant shape retrieval. ACM Trans. Graph. 30, 1–20 (2011)CrossRef
27.
Zurück zum Zitat Ahmed, N., Theobalt, C., Rössl, C., Thrun, S., Seidel, H.P.: Dense correspondence finding for parameterization-free animation reconstruction from video. In: CVPR (2008) Ahmed, N., Theobalt, C., Rössl, C., Thrun, S., Seidel, H.P.: Dense correspondence finding for parameterization-free animation reconstruction from video. In: CVPR (2008)
28.
Zurück zum Zitat Mian, A., Bennamoun, M., Owens, R.: On the repeatability and quality of keypoints for local feature-based 3D object retrieval from cluttered scenes. Int. J. Comput. Vis. 89, 348–361 (2010)CrossRef Mian, A., Bennamoun, M., Owens, R.: On the repeatability and quality of keypoints for local feature-based 3D object retrieval from cluttered scenes. Int. J. Comput. Vis. 89, 348–361 (2010)CrossRef
29.
Zurück zum Zitat Lee, S., Lu, Z., Kim, H.: Probabilistic 3D object recognition with both positive and negative evidences. In: ICCV (2011) Lee, S., Lu, Z., Kim, H.: Probabilistic 3D object recognition with both positive and negative evidences. In: ICCV (2011)
30.
Zurück zum Zitat Hu, W., Zhu, S.C.: Learning a probabilistic model mixing 3d and 2d primitives for view invariant object recognition. In: CVPR (2010) Hu, W., Zhu, S.C.: Learning a probabilistic model mixing 3d and 2d primitives for view invariant object recognition. In: CVPR (2010)
31.
Zurück zum Zitat Kang, H., Hebert, M., Kanade, T.: Discovering object instances from scenes of daily living. In: ICCV (2011) Kang, H., Hebert, M., Kanade, T.: Discovering object instances from scenes of daily living. In: ICCV (2011)
32.
Zurück zum Zitat Krüger, N., Janssen, P., Kalkan, S., Lappe, M., Leonardis, A., Piater, J., Rodriguez-Sanchez, A., Wiskott, L.: Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE PAMI 35, 1847–1871 (2013)CrossRef Krüger, N., Janssen, P., Kalkan, S., Lappe, M., Leonardis, A., Piater, J., Rodriguez-Sanchez, A., Wiskott, L.: Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE PAMI 35, 1847–1871 (2013)CrossRef
33.
Zurück zum Zitat Fidler, S., Boben, M., Leonardis, A.: Similarity-based cross-layered hierarchical representation for object categorization. In: CVPR (2008) Fidler, S., Boben, M., Leonardis, A.: Similarity-based cross-layered hierarchical representation for object categorization. In: CVPR (2008)
34.
Zurück zum Zitat Mutch, J., Lowe, D.: Object class recognition and localization using sparse features with limited receptive fields. Int. J. Comput. Vis. 80, 45–57 (2008)CrossRef Mutch, J., Lowe, D.: Object class recognition and localization using sparse features with limited receptive fields. Int. J. Comput. Vis. 80, 45–57 (2008)CrossRef
35.
Zurück zum Zitat Pugeault, N., Wörgötter, F., Krüger, N.: Accumulated visual representation for cognitive vision. In: BMVC (2008) Pugeault, N., Wörgötter, F., Krüger, N.: Accumulated visual representation for cognitive vision. In: BMVC (2008)
36.
Zurück zum Zitat Chaudhuri, B., Sarkar, N.: Texture segmentation using fractal dimension. T-PAMI 17, 72–76 (1995)CrossRef Chaudhuri, B., Sarkar, N.: Texture segmentation using fractal dimension. T-PAMI 17, 72–76 (1995)CrossRef
37.
Zurück zum Zitat Felsberg, M., Sommer, G.: Image features based on a new approach to 2D rotation invariant quadrature filters. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 369–383. Springer, Heidelberg (2002) CrossRef Felsberg, M., Sommer, G.: Image features based on a new approach to 2D rotation invariant quadrature filters. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 369–383. Springer, Heidelberg (2002) CrossRef
38.
Zurück zum Zitat Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003) Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)
39.
Zurück zum Zitat Chum, O., Matas, J.: Optimal randomized RANSAC. T-PAMI 30, 1472–1482 (2008)CrossRef Chum, O., Matas, J.: Optimal randomized RANSAC. T-PAMI 30, 1472–1482 (2008)CrossRef
40.
Zurück zum Zitat Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. T-PAMI 13, 376–380 (1991)CrossRef Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. T-PAMI 13, 376–380 (1991)CrossRef
41.
Zurück zum Zitat Xue, Z., Kasper, A., Zoellner, J., Dillmann, R.: An automatic grasp planning system for service robots. In: ICAR (2009) Xue, Z., Kasper, A., Zoellner, J., Dillmann, R.: An automatic grasp planning system for service robots. In: ICAR (2009)
42.
Zurück zum Zitat Frome, A., Huber, D., Kolluri, R., Bülow, T., Malik, J.: Recognizing objects in range data using regional point descriptors. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3023, pp. 224–237. Springer, Heidelberg (2004) CrossRef Frome, A., Huber, D., Kolluri, R., Bülow, T., Malik, J.: Recognizing objects in range data using regional point descriptors. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3023, pp. 224–237. Springer, Heidelberg (2004) CrossRef
43.
Zurück zum Zitat Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape context. T-PAMI 24, 509–522 (2002)CrossRef Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape context. T-PAMI 24, 509–522 (2002)CrossRef
Metadaten
Titel
Indoor Objects and Outdoor Urban Scenes Recognition by 3D Visual Primitives
verfasst von
Junsheng Fu
Joni-Kristian Kämäräinen
Anders Glent Buch
Norbert Krüger
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-16628-5_20