nach oben

International Journal of Computer Vision

Erschienen in:

12.07.2016

Image Annotation by Propagating Labels from Semantic Neighbourhoods

verfasst von: Yashaswi Verma, C. V. Jawahar

Erschienen in: International Journal of Computer Vision | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Automatic image annotation aims at predicting a set of semantic labels for an image. Because of large annotation vocabulary, there exist large variations in the number of images corresponding to different labels (“class-imbalance”). Additionally, due to the limitations of human annotation, several images are not annotated with all the relevant labels (“incomplete-labelling”). These two issues affect the performance of most of the existing image annotation models. In this work, we propose 2-pass k-nearest neighbour (2PKNN) algorithm. It is a two-step variant of the classical k-nearest neighbour algorithm, that tries to address these issues in the image annotation task. The first step of 2PKNN uses “image-to-label” similarities, while the second step uses “image-to-image” similarities, thus combining the benefits of both. We also propose a metric learning framework over 2PKNN. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label data. In addition to the features provided by Guillaumin et al. (2009) that are used by almost all the recent image annotation methods, we benchmark using new features that include features extracted from a generic convolutional neural network model and those computed using modern encoding techniques. We also learn linear and kernelized cross-modal embeddings over different feature combinations to reduce semantic gap between visual features and textual labels. Extensive evaluations on four image annotation datasets (Corel-5K, ESP-Game, IAPR-TC12 and MIRFlickr-25K) demonstrate that our method achieves promising results, and establishes a new state-of-the-art on the prevailing image annotation datasets.

Vorheriger Artikel Fast-Match: Fast Affine Template Matching

Nächster Artikel Convolutional Patch Representations for Image Retrieval: An Unsupervised Approach

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

We shall use the terms class/label interchangeably.

These features are available at http://lear.inrialpes.fr/people/guillaumin/data.php.

An implementation of 2PKNN and metric learning is available at http://researchweb.iiit.ac.in/~yashaswi.verma/eccv12/2pknn.zip.

Features indexed by (3), (4), and (9) to (15) in Table 2.

The code is available at http://lear.inrialpes.fr/people/guillaumin/code.php.

Anderson, C. (2006). The long tail: Why the future of business is selling less of more. Hyperion.

Ballan, L., Uricchio, T., Seidenari, L., & Bimbo, A. D. (2014). A cross-media model for automatic image annotation. In Proceedings of the ICMR.

Carneiro, G., Chan, A. B., Moreno, P. J., & Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 394–410.CrossRef

Chen, M., Zheng, A., & Weinberger, K. Q. (2013). Fast image tagging. In Proceedings of the ICML.

Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., et al. (2014). DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of the ICML.

Duygulu, P., Barnard, K., de Freitas, J. F., & Forsyth, D. A. (2002). Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proceedings of the ECCV (pp. 97–112).

Feng, S. L., Manmatha, R., & Lavrenko, V. (2004). Multiple bernoulli relevance models for image and video annotation. In Proceedings of the CVPR (pp. 1002–1009).

Fu, H., Zhang, Q., & Qiu, G. (2012). Random forest for image annotation. In Proceedings of the ECCV (pp. 86–99).

Grubinger, M. (2007). Analysis and evaluation of visual information systems performance. PhD thesis, Victoria University, Melbourne, Australia.

Guillaumin, M., Mensink, T., Verbeek, J. J., & Schmid, C. (2009). Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Proceedings of the ICCV (pp. 309–316).

Gupta, A., Verma, Y., & Jawahar, C. V. (2012). Choosing linguistics over vision to describe images. In Proceedings of the AAAI.

Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12), 2639–2664.CrossRefMATH

Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 321–377.CrossRefMATH

Huiskes, M. J., & Lew, M. S. (2008). The MIR Flickr retrieval evaluation. In MIR.

Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In Proceedings of the CVPR (pp. 3304–3311).

Jeon, J., Lavrenko, V., & Manmatha, R. (2003). Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the ACM SIGIR (pp. 119–126).

Jin, R., Wang, S., & Zhou, Z. H. (2009). Learning a distance metric from multi-instance multi-label data. In Proceedings of the CVPR (pp. 896–902).

Kalayeh, M. M., Idrees, H., & Shah, M. (2014). NMF-KNN: Image annotation using weighted multi-view non-negative matrix factorization. In Proceedings of the CVPR.

Lavrenko, V., Manmatha, R., & Jeon, J. (2003). A model for learning the semantics of pictures. In NIPS.

Li, X., Snoek, C. G. M., & Worring, M. (2009). Learning social tag relevance by neighbor voting. IEEE Transactions on Multimedia, 11(7), 1310–1322.CrossRef

Liu, J., Li, M., Liu, Q., Lu, H., & Ma, S. (2009). Image annotation via graph learning. Pattern Recognition, 42(2), 218–228.CrossRefMATH

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef

Makadia, A., Pavlovic, V., & Kumar, S. (2008). A new baseline for image annotation. In Proceedings of the ECCV (pp. 316–329).

Makadia, A., Pavlovic, V., & Kumar, S. (2010). Baselines for image annotation. International Journal of Computer Vision, 90(1), 88–105.CrossRef

Metzler, D., & Manmatha, R. (2004). An inference network approach to image retrieval. In Proceedings of the CIVR (pp. 42–50).

Moran, S., & Lavrenko, V. (2011). Optimal tag sets for automatic image annotation. In Proceedings of the BMVC (pp. 1.1–1.11).

Moran, S., & Lavrenko, V. (2014). A sparse kernel relevance model for automatic image annotation. International Journal of Multimedia Information Retrieval, 3(4), 209–229.

Mori, Y., Takahashi, H., & Oka, R. (1999). Image-to-word transformation based on dividing and vector quantizing images with words. In MISRM’99 first international workshop on multimedia intelligent storage and retrieval management.

Murthy, V. N., Can, E. F., & Manmatha, R. (2014). A hybrid model for automatic image annotation. In Proceedings of the ICMR.

Nakayama, H. (2011). Linear distance metric learning for large-scale generic image recognition. PhD thesis, The University of Tokyo, Japan.

Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.CrossRefMATH

Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In Proceedings of the ECCV (pp. 143–156).

Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal estimated sub-gradient solver for svm. In Proceedings of the ICML (pp. 807–814).

van de Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In Proceedings of the ECCV (pp. 334–348).

Verbeek, J., Guillaumin, M., Mensink, T., & Schmid, C. (2010). Image Annotation with TagProp on the MIRFLICKR set. In MIR.

Verma, Y., & Jawahar, C. V. (2012). Image annotation using metric learning in semantic neighbourhoods. In Proceedings of the ECCV (pp. 836–849).

Verma, Y., & Jawahar, C. V. (2013). Exploring SVM for image annotation in presence of confusing labels. In Proceedings of the BMVC.

von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In SIGCHI conference on human factors in computing systems (pp. 319–326).

Wang, C., Blei, D., & Fei-Fei, L. (2009). Simultaneous image classification and annotation. In Proceedings of the CVPR.

Wang, H., Huang, H., & Ding, C. H. Q. (2011). Image annotation using bi-relational graph of images and semantic labels. In Proceedings of the CVPR (pp. 793–800).

Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10, 207–244.MATH

Xiang, Y., Zhou, X., Chua, T. S., & Ngo, C. W. (2009). A revisit of generative model for automatic image annotation using markov random fields. In Proceedings of the CVPR (pp. 1153–1160).

Yavlinsky, A., Schofield, E., & Rüger, S. (2005). Automated image annotation using global features and robust nonparametric density estimation. In Proceedings of the CIVR (pp. 507–517).

Zhang, S., Huang, J., Huang, Y., Yu, Y., Li, H., & Metaxas, D. N. (2010). Automatic image annotation using group sparsity. In Proceedings of the CVPR (pp. 3312–3319).

Titel: Image Annotation by Propagating Labels from Semantic Neighbourhoods
verfasst von: Yashaswi Verma
C. V. Jawahar
Publikationsdatum: 12.07.2016
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 1/2017
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-016-0927-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2017

Convolutional Patch Representations for Image Retrieval: An Unsupervised Approach

A Discrete MRF Framework for Integrated Multi-Atlas Registration and Segmentation

Fast-Match: Fast Affine Template Matching

Hollywood 3D: What are the Best 3D Features for Action Recognition?

Appreciation to IJCV Reviewers of 2016

A Unified Framework for Compositional Fitting of Active Appearance Models

Premium Partner