Skip to main content
Erschienen in: International Journal of Computer Vision 1/2012

01.10.2012

Improving Image Classification Using Semantic Attributes

verfasst von: Yu Su, Frédéric Jurie

Erschienen in: International Journal of Computer Vision | Ausgabe 1/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Bag-of-Words (BoW) model—commonly used for image classification—has two strong limitations: on one hand, visual words are lacking of explicit meanings, on the other hand, they are usually polysemous. This paper proposes to address these two limitations by introducing an intermediate representation based on the use of semantic attributes. Specifically, two different approaches are proposed. Both approaches consist in predicting a set of semantic attributes for the entire images as well as for local image regions, and in using these predictions to build the intermediate level features. Experiments on four challenging image databases (PASCAL VOC 2007, Scene-15, MSRCv2 and SUN-397) show that both approaches improve performance of the BoW model significantly. Moreover, their combination achieves the state-of-the-art results on several of these image databases.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bosch, A., Zisserman, A., & Munoz, X. (2006). Scene classification via pLSA. In ECCV. Bosch, A., Zisserman, A., & Munoz, X. (2006). Scene classification via pLSA. In ECCV.
Zurück zum Zitat Chapelle, O., Haffner, P., & Vapnik, V. (1999). Support vector machines for histogram-based image classification. IEEE Transactions on Neural Networks, 10(5), 1055–1064. CrossRef Chapelle, O., Haffner, P., & Vapnik, V. (1999). Support vector machines for histogram-based image classification. IEEE Transactions on Neural Networks, 10(5), 1055–1064. CrossRef
Zurück zum Zitat Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proc. workshop on statistical learning in computer vision, at ECCV. Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proc. workshop on statistical learning in computer vision, at ECCV.
Zurück zum Zitat Delaitre, V., Laptev, I., & Sivic, J. (2010). Recognizing human actions in still images: a study of bag-of-features and part-based representations. In BMVC. Delaitre, V., Laptev, I., & Sivic, J. (2010). Recognizing human actions in still images: a study of bag-of-features and part-based representations. In BMVC.
Zurück zum Zitat Deselaers, T., & Ferrari, V. (2011). Visual and semantic similarity in ImageNet. In CVPR. Deselaers, T., & Ferrari, V. (2011). Visual and semantic similarity in ImageNet. In CVPR.
Zurück zum Zitat Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In CVPR. Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In CVPR.
Zurück zum Zitat Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In CVPR. Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In CVPR.
Zurück zum Zitat Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In ICCV. Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In ICCV.
Zurück zum Zitat van Gemert, J., Veenman, C., Smeulders, A., & Geusebroek, J. M. (2010). Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7), 1271–1283. CrossRef van Gemert, J., Veenman, C., Smeulders, A., & Geusebroek, J. M. (2010). Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7), 1271–1283. CrossRef
Zurück zum Zitat Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Tech. rep. 7694. California Institute of Technology. Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Tech. rep. 7694. California Institute of Technology.
Zurück zum Zitat Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In ICCV. Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In ICCV.
Zurück zum Zitat Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proc. of uncertainty in artificial intelligence. Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proc. of uncertainty in artificial intelligence.
Zurück zum Zitat Ji, R., Yao, H., Sun, X., Zhong, B., & Gao, W. (2010). Towards semantic embedding in visual vocabulary. In CVPR. Ji, R., Yao, H., Sun, X., Zhong, B., & Gao, W. (2010). Towards semantic embedding in visual vocabulary. In CVPR.
Zurück zum Zitat Khan, F., van de Weijer, J., & Vanrell, M. (2009). Top-down color attention for object recognition. In ICCV. Khan, F., van de Weijer, J., & Vanrell, M. (2009). Top-down color attention for object recognition. In ICCV.
Zurück zum Zitat Kittler, J., Hatef, M., Duin, R., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239. CrossRef Kittler, J., Hatef, M., Duin, R., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239. CrossRef
Zurück zum Zitat Kumar, N., Berg, A., Belhumeur, P., & Nayar, S. (2009). Attribute and simile classifiers for face verification. In ICCV. Kumar, N., Berg, A., Belhumeur, P., & Nayar, S. (2009). Attribute and simile classifiers for face verification. In ICCV.
Zurück zum Zitat Lampert, C., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In CVPR. Lampert, C., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In CVPR.
Zurück zum Zitat Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In CVPR. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In CVPR.
Zurück zum Zitat Leung, T., & Malik, J. (2001). Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, 43, 29–44. MATHCrossRef Leung, T., & Malik, J. (2001). Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, 43, 29–44. MATHCrossRef
Zurück zum Zitat Li, L., Su, H., Xing, E., & Fei-Fei, L. (2010a). Object bank: a high-level image representation for scene classification & semantic feature sparsification. In NIPS. Li, L., Su, H., Xing, E., & Fei-Fei, L. (2010a). Object bank: a high-level image representation for scene classification & semantic feature sparsification. In NIPS.
Zurück zum Zitat Li, L. J., Wang, C., Lim, Y., Blei, D., & Fei-Fei, L. (2010b). Building and using a semantivisual image hierarchy. In CVPR. Li, L. J., Wang, C., Lim, Y., Blei, D., & Fei-Fei, L. (2010b). Building and using a semantivisual image hierarchy. In CVPR.
Zurück zum Zitat Liu, J., Yang, Y., & Shah, M. (2009). Learning semantic visual vocabularies using diffusion distance. In CVPR. Liu, J., Yang, Y., & Shah, M. (2009). Learning semantic visual vocabularies using diffusion distance. In CVPR.
Zurück zum Zitat Moosmann, F., Triggs, B., & Jurie, F. (2007). Fast discriminative visual codebooks using randomized clustering forests. In NIPS. Moosmann, F., Triggs, B., & Jurie, F. (2007). Fast discriminative visual codebooks using randomized clustering forests. In NIPS.
Zurück zum Zitat Morioka, N., & Satoh, S. (2010). Building compact local pairwise codebook with joint feature space clustering. In ECCV. Morioka, N., & Satoh, S. (2010). Building compact local pairwise codebook with joint feature space clustering. In ECCV.
Zurück zum Zitat Perronnin, F., Senchez, J., et al. (2010). Large-scale image categorization with explicit data embedding. In CVPR. Perronnin, F., Senchez, J., et al. (2010). Large-scale image categorization with explicit data embedding. In CVPR.
Zurück zum Zitat Rosch, E., Mervis, C., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8(3), 382–439. CrossRef Rosch, E., Mervis, C., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8(3), 382–439. CrossRef
Zurück zum Zitat Saghafi, B., Farahzadeh, E., Rajan, D., & Sluzek, A. (2010). Embedding visual words into concept space for action and scene recognition. In BMVC. Saghafi, B., Farahzadeh, E., Rajan, D., & Sluzek, A. (2010). Embedding visual words into concept space for action and scene recognition. In BMVC.
Zurück zum Zitat Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In ICCV. Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In ICCV.
Zurück zum Zitat Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In ICCV. Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In ICCV.
Zurück zum Zitat Sivic, J., Russell, B., Zisserman, A., Freeman, W., & Efros, A. (2008). Unsupervised discovery of visual object class hierarchies. In CVPR. Sivic, J., Russell, B., Zisserman, A., Freeman, W., & Efros, A. (2008). Unsupervised discovery of visual object class hierarchies. In CVPR.
Zurück zum Zitat Su, Y., & Jurie, F. (2011). Visual word disambiguation by semantic contexts. Su, Y., & Jurie, F. (2011). Visual word disambiguation by semantic contexts.
Zurück zum Zitat Su, Y., Allan, M., & Jurie, F. (2010). Improving object classification using semantic attributes. In BMVC. Su, Y., Allan, M., & Jurie, F. (2010). Improving object classification using semantic attributes. In BMVC.
Zurück zum Zitat Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In ECCV. Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In ECCV.
Zurück zum Zitat Ullah, M., Parizi, S., & Laptev, I. (2010). Improving bag-of-features action recognition with non-local cues. In BMVC. Ullah, M., Parizi, S., & Laptev, I. (2010). Improving bag-of-features action recognition with non-local cues. In BMVC.
Zurück zum Zitat Vogel, J., & Schiele, B. (2007). Semantic modeling of natural scenes for content-based image retrieval. International Journal of Computer Vision, 72(2), 133–157. CrossRef Vogel, J., & Schiele, B. (2007). Semantic modeling of natural scenes for content-based image retrieval. International Journal of Computer Vision, 72(2), 133–157. CrossRef
Zurück zum Zitat Wang, G., Hoiem, D., & Forsyth, D. (2009). Learning image similarity from Flickr groups using stochastic intersection kernel machines. In ICCV. Wang, G., Hoiem, D., & Forsyth, D. (2009). Learning image similarity from Flickr groups using stochastic intersection kernel machines. In ICCV.
Zurück zum Zitat Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned universal visual dictionary. In ICCV. Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned universal visual dictionary. In ICCV.
Zurück zum Zitat Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba, A. (2010). Sun database: large-scale scene recognition from abbey to zoo. In CVPR. Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba, A. (2010). Sun database: large-scale scene recognition from abbey to zoo. In CVPR.
Zurück zum Zitat Yang, J., Li, Y., Tian, Y., Duan, L., & Gao, W. (2009). Group-sensitive multiple kernel learning for object categorization. In ICCV. Yang, J., Li, Y., Tian, Y., Duan, L., & Gao, W. (2009). Group-sensitive multiple kernel learning for object categorization. In ICCV.
Zurück zum Zitat Yuan, J., Wu, Y., & Yang, M. (2007). Discovery of collocation patterns: from visual words to visual phrases. In CVPR. Yuan, J., Wu, Y., & Yang, M. (2007). Discovery of collocation patterns: from visual words to visual phrases. In CVPR.
Zurück zum Zitat Zhang, Y., & Chen, T. (2009). Efficient kernels for identifying unbounded-order spatial features. In CVPR. Zhang, Y., & Chen, T. (2009). Efficient kernels for identifying unbounded-order spatial features. In CVPR.
Zurück zum Zitat Zheng, Y., Zhao, M., Neo, S., Chua, T., & Tian, Q. (2008). Visual synset: towards a higher-level visual representation. In CVPR. Zheng, Y., Zhao, M., Neo, S., Chua, T., & Tian, Q. (2008). Visual synset: towards a higher-level visual representation. In CVPR.
Zurück zum Zitat Zhou, X., Yu, K., Zhang, T., & Huang, T. (2010). Image classification using super-vector coding of local image descriptors. In ECCV. Zhou, X., Yu, K., Zhang, T., & Huang, T. (2010). Image classification using super-vector coding of local image descriptors. In ECCV.
Metadaten
Titel
Improving Image Classification Using Semantic Attributes
verfasst von
Yu Su
Frédéric Jurie
Publikationsdatum
01.10.2012
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 1/2012
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-012-0529-4

Weitere Artikel der Ausgabe 1/2012

International Journal of Computer Vision 1/2012 Zur Ausgabe

Premium Partner