Skip to main content

2016 | OriginalPaper | Buchkapitel

Nonparametric Scene Parsing via Label Transfer

verfasst von : Ce Liu, Jenny Yuen, Antonio Torralba

Erschienen in: Dense Image Correspondences for Computer Vision

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

While there has been a lot of recent work on object recognition and image understanding, the focus has been on carefully establishing mathematical models for images, scenes, and objects. In this chapter, we propose a novel, nonparametric approach for object recognition and scene parsing using a new technology we name label transfer. For an input image, our system first retrieves its nearest neighbors from a large database containing fully annotated images. Then, the system establishes dense correspondences between the input image and each of the nearest neighbors using the dense SIFT flow algorithm (Liu et al., 33(5):978–994, 2011 Chap. 2), which aligns two images based on local image structures. Finally, based on the dense scene correspondences obtained from the SIFT flow, our system warps the existing annotations, and integrates multiple cues in a Markov random field framework to segment and recognize the query image. Promising experimental results have been achieved by our nonparametric scene parsing system on challenging databases. Compared to existing object recognition approaches that require training classifiers or appearance models for each object category, our system is easy to implement, has few parameters, and embeds contextual information naturally in the retrieval/alignment procedure.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Other scene parsing and image understanding systems also require such a database. We do not require more than others.
 
2
SIFT descriptors are computed at each pixel using a 16 × 16 window. The window is divided into 4 × 4 cells, and image gradients within each cell are quantized into a 8-bin histogram. Therefore, the pixel-wise SIFT feature is a 128-D vector.
 
3
This extrapolation is different from moving to a larger database in Sect. 5.2, where indoor scenes are included. This number is anticipated only when images similar to the LMO database are added.
 
Literatur
1.
Zurück zum Zitat Adelson, E.H.: On seeing stuff: the perception of materials by humans and machines. In: SPIE, Human Vision and Electronic Imaging VI, pp. 1–12 (2001) Adelson, E.H.: On seeing stuff: the perception of materials by humans and machines. In: SPIE, Human Vision and Electronic Imaging VI, pp. 1–12 (2001)
2.
Zurück zum Zitat Belongie, S., Malik, J., Puzicha, J.: Shape context: a new descriptor for shape matching and object recognition. In: Advances in Neural Information Processing Systems (NIPS) (2000) Belongie, S., Malik, J., Puzicha, J.: Shape context: a new descriptor for shape matching and object recognition. In: Advances in Neural Information Processing Systems (NIPS) (2000)
3.
Zurück zum Zitat Berg, A., Berg, T., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005) Berg, A., Berg, T., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)
4.
Zurück zum Zitat Borg, I., Groenen, P.: Modern Multidimensional Scaling: Theory and Applications, 2nd edn. Springer, New York (2005)MATH Borg, I., Groenen, P.: Modern Multidimensional Scaling: Theory and Applications, 2nd edn. Springer, New York (2005)MATH
5.
Zurück zum Zitat Branson, S., Wah, C., Babenko, B., Schroff, F., Welinder, P., Perona, P., Belongie, S.: Visual recognition with humans in the loop. In: European Conference on Computer Vision (ECCV) (2010) Branson, S., Wah, C., Babenko, B., Schroff, F., Welinder, P., Perona, P., Belongie, S.: Visual recognition with humans in the loop. In: European Conference on Computer Vision (ECCV) (2010)
6.
Zurück zum Zitat Choi, M.J., Lim, J.J., Torralba, A., Willsky, A.: Exploiting hierarchical context on a large database of object categories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010) Choi, M.J., Lim, J.J., Torralba, A., Willsky, A.: Exploiting hierarchical context on a large database of object categories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
7.
Zurück zum Zitat Crandall, D., Felzenszwalb, P., Huttenlocher, D.: Spatial priors for part-based recognition using statistical models. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005) Crandall, D., Felzenszwalb, P., Huttenlocher, D.: Spatial priors for part-based recognition using statistical models. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)
8.
Zurück zum Zitat Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005) Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)
9.
Zurück zum Zitat Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: IEEE International Conference on Computer Vision (ICCV) (2009) Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: IEEE International Conference on Computer Vision (ICCV) (2009)
10.
Zurück zum Zitat Divvala, S.K., Hoiem, D., Hays, J.H., Efros, A.A., Hebert, M.: An empirical study of context in object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009) Divvala, S.K., Hoiem, D., Hays, J.H., Efros, A.A., Hebert, M.: An empirical study of context in object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
11.
Zurück zum Zitat Edwards, G., Cootes, T., Taylor, C.: Face recognition using active appearance models. In: European Conference on Computer Vision (ECCV) (1998) Edwards, G., Cootes, T., Taylor, C.: Face recognition using active appearance models. In: European Conference on Computer Vision (ECCV) (1998)
12.
Zurück zum Zitat Efros, A.A., Leung, T.: Texture synthesis by non-parametric sampling. In: IEEE International Conference on Computer Vision (ICCV) (1999) Efros, A.A., Leung, T.: Texture synthesis by non-parametric sampling. In: IEEE International Conference on Computer Vision (ICCV) (1999)
13.
Zurück zum Zitat Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)CrossRef Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)CrossRef
14.
Zurück zum Zitat Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008) Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008)
15.
Zurück zum Zitat Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2003) Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2003)
16.
Zurück zum Zitat Frome, A., Singer, Y., Malik, J.: Image retrieval and classification using local distance functions. In: Advances in Neural Information Processing Systems (NIPS) (2006) Frome, A., Singer, Y., Malik, J.: Image retrieval and classification using local distance functions. In: Advances in Neural Information Processing Systems (NIPS) (2006)
17.
Zurück zum Zitat Galleguillos, C., McFee, B., Belongie, S., Lanckriet, G.R.G.: Multi-class object localization by combining local contextual interactions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010) Galleguillos, C., McFee, B., Belongie, S., Lanckriet, G.R.G.: Multi-class object localization by combining local contextual interactions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
18.
Zurück zum Zitat Grauman, K., Darrell, T.: Pyramid match kernels: Discriminative classification with sets of image features. In: IEEE International Conference on Computer Vision (ICCV) (2005) Grauman, K., Darrell, T.: Pyramid match kernels: Discriminative classification with sets of image features. In: IEEE International Conference on Computer Vision (ICCV) (2005)
19.
Zurück zum Zitat Gupta, A., Davis, L.S.: Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In: European Conference on Computer Vision (ECCV) (2008) Gupta, A., Davis, L.S.: Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In: European Conference on Computer Vision (ECCV) (2008)
20.
Zurück zum Zitat Hays, J., Efros, A.A.: Scene completion using millions of photographs. ACM SIGGRAPH 26(3) (2007) Hays, J., Efros, A.A.: Scene completion using millions of photographs. ACM SIGGRAPH 26(3) (2007)
21.
Zurück zum Zitat Heitz, G., Koller, D.: Learning spatial context: using stuff to find things. In: European Conference on Computer Vision (ECCV) (2008) Heitz, G., Koller, D.: Learning spatial context: using stuff to find things. In: European Conference on Computer Vision (ECCV) (2008)
22.
Zurück zum Zitat Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006) Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006)
23.
Zurück zum Zitat Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. II, pp. 2169–2178 (2006) Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. II, pp. 2169–2178 (2006)
24.
Zurück zum Zitat LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef
25.
Zurück zum Zitat Liang, L., Liu, C., Xu, Y.Q., Guo, B.N., Shum, H.Y.: Real-time texture synthesis by patch-based sampling. ACM Trans. Graph. (TOG) 20(3), 127–150 (2001) Liang, L., Liu, C., Xu, Y.Q., Guo, B.N., Shum, H.Y.: Real-time texture synthesis by patch-based sampling. ACM Trans. Graph. (TOG) 20(3), 127–150 (2001)
26.
Zurück zum Zitat Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: dense correspondence across different scenes. In: European Conference on Computer Vision (ECCV) (2008) Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: dense correspondence across different scenes. In: European Conference on Computer Vision (ECCV) (2008)
27.
Zurück zum Zitat Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: label transfer via dense scene alignment. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009) Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: label transfer via dense scene alignment. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
28.
Zurück zum Zitat Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across different scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011)CrossRef Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across different scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011)CrossRef
29.
Zurück zum Zitat Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRef Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRef
30.
Zurück zum Zitat Murphy, K.P., Torralba, A., Freeman, W.T.: Using the forest to see the trees: a graphical model relating features, objects, and scenes. In: Advances in Neural Information Processing Systems (NIPS) (2003) Murphy, K.P., Torralba, A., Freeman, W.T.: Using the forest to see the trees: a graphical model relating features, objects, and scenes. In: Advances in Neural Information Processing Systems (NIPS) (2003)
31.
Zurück zum Zitat Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006) Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006)
32.
Zurück zum Zitat Obdrzalek, S., Matas, J.: Sub-linear indexing for large scale object recognition. In: British Machine Vision Conference (2005)CrossRef Obdrzalek, S., Matas, J.: Sub-linear indexing for large scale object recognition. In: British Machine Vision Conference (2005)CrossRef
33.
Zurück zum Zitat Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)CrossRefMATH Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)CrossRefMATH
34.
Zurück zum Zitat Park, D., Ramanan, D., Fowlkes, C.: Multiresolution models for object detection. In: European Conference on Computer Vision (ECCV) (2010) Park, D., Ramanan, D., Fowlkes, C.: Multiresolution models for object detection. In: European Conference on Computer Vision (ECCV) (2010)
35.
Zurück zum Zitat Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: IEEE International Conference on Computer Vision (ICCV) (2007) Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: IEEE International Conference on Computer Vision (ICCV) (2007)
36.
Zurück zum Zitat Russell, B.C., Torralba, A., Liu, C., Fergus, R., Freeman, W.T.: Object recognition by scene alignment. In: Advances in Neural Information Processing Systems (NIPS) (2007) Russell, B.C., Torralba, A., Liu, C., Fergus, R., Freeman, W.T.: Object recognition by scene alignment. In: Advances in Neural Information Processing Systems (NIPS) (2007)
37.
Zurück zum Zitat Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008) Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008)
38.
Zurück zum Zitat Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T., Zisserman, A.: Segmenting scenes by matching image composites. In: Advances in Neural Information Processing Systems (NIPS) (2009) Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T., Zisserman, A.: Segmenting scenes by matching image composites. In: Advances in Neural Information Processing Systems (NIPS) (2009)
39.
Zurück zum Zitat Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlatons. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006) Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlatons. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006)
40.
Zurück zum Zitat Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter sensitive hashing. In: IEEE International Conference on Computer Vision (ICCV) (2003) Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter sensitive hashing. In: IEEE International Conference on Computer Vision (ICCV) (2003)
41.
Zurück zum Zitat Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007) Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)
42.
Zurück zum Zitat Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 81(1), 2–23 (2009) Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 81(1), 2–23 (2009)
43.
Zurück zum Zitat Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision (ICCV) (2003) Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision (ICCV) (2003)
44.
Zurück zum Zitat Sudderth, E., Torralba, A., Freeman, W.T., Willsky, W.: Describing visual scenes using transformed dirichlet processes. In: Advances in Neural Information Processing Systems (NIPS) (2005) Sudderth, E., Torralba, A., Freeman, W.T., Willsky, W.: Describing visual scenes using transformed dirichlet processes. In: Advances in Neural Information Processing Systems (NIPS) (2005)
45.
Zurück zum Zitat Tighe, J., Lazebnik, S.: Superparsing: Scalable nonparametric image parsing with superpixels. In: European Conference on Computer Vision (ECCV) (2010) Tighe, J., Lazebnik, S.: Superparsing: Scalable nonparametric image parsing with superpixels. In: European Conference on Computer Vision (ECCV) (2010)
46.
Zurück zum Zitat Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large dataset for non-parametric object and scene recognition. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2008) Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large dataset for non-parametric object and scene recognition. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2008)
47.
Zurück zum Zitat Turk, M., Pentland, A.: Face recognition using eigenfaces. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (1991) Turk, M., Pentland, A.: Face recognition using eigenfaces. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (1991)
48.
Zurück zum Zitat Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2001) Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2001)
49.
Zurück zum Zitat Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: European Conference on Computer Vision (ECCV) (2000) Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: European Conference on Computer Vision (ECCV) (2000)
50.
Zurück zum Zitat Winn, J., Criminisi, A., Minka, T.: Object categorization by learned universal visual dictionary. In: IEEE International Conference on Computer Vision (ICCV) (2005) Winn, J., Criminisi, A., Minka, T.: Object categorization by learned universal visual dictionary. In: IEEE International Conference on Computer Vision (ICCV) (2005)
51.
Zurück zum Zitat Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to zoo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010) Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to zoo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
52.
Zurück zum Zitat Yang, Y., Hallman, S., Ramanan, D., Fowlkes, C.: Layered object detection for multi-class segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010) Yang, Y., Hallman, S., Ramanan, D., Fowlkes, C.: Layered object detection for multi-class segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
Metadaten
Titel
Nonparametric Scene Parsing via Label Transfer
verfasst von
Ce Liu
Jenny Yuen
Antonio Torralba
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-23048-1_10

Neuer Inhalt