Skip to main content
Erschienen in: International Journal of Computer Vision 1/2017

12.07.2016

Image Annotation by Propagating Labels from Semantic Neighbourhoods

verfasst von: Yashaswi Verma, C. V. Jawahar

Erschienen in: International Journal of Computer Vision | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automatic image annotation aims at predicting a set of semantic labels for an image. Because of large annotation vocabulary, there exist large variations in the number of images corresponding to different labels (“class-imbalance”). Additionally, due to the limitations of human annotation, several images are not annotated with all the relevant labels (“incomplete-labelling”). These two issues affect the performance of most of the existing image annotation models. In this work, we propose 2-pass k-nearest neighbour (2PKNN) algorithm. It is a two-step variant of the classical k-nearest neighbour algorithm, that tries to address these issues in the image annotation task. The first step of 2PKNN uses “image-to-label” similarities, while the second step uses “image-to-image” similarities, thus combining the benefits of both. We also propose a metric learning framework over 2PKNN. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label data. In addition to the features provided by Guillaumin et al. (2009) that are used by almost all the recent image annotation methods, we benchmark using new features that include features extracted from a generic convolutional neural network model and those computed using modern encoding techniques. We also learn linear and kernelized cross-modal embeddings over different feature combinations to reduce semantic gap between visual features and textual labels. Extensive evaluations on four image annotation datasets (Corel-5K, ESP-Game, IAPR-TC12 and MIRFlickr-25K) demonstrate that our method achieves promising results, and establishes a new state-of-the-art on the prevailing image annotation datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
We shall use the terms class/label interchangeably.
 
3
An implementation of 2PKNN and metric learning is available at http://​researchweb.​iiit.​ac.​in/​~yashaswi.​verma/​eccv12/​2pknn.​zip.
 
4
Features indexed by (3), (4), and (9) to (15) in Table 2.
 
Literatur
Zurück zum Zitat Anderson, C. (2006). The long tail: Why the future of business is selling less of more. Hyperion. Anderson, C. (2006). The long tail: Why the future of business is selling less of more. Hyperion.
Zurück zum Zitat Ballan, L., Uricchio, T., Seidenari, L., & Bimbo, A. D. (2014). A cross-media model for automatic image annotation. In Proceedings of the ICMR. Ballan, L., Uricchio, T., Seidenari, L., & Bimbo, A. D. (2014). A cross-media model for automatic image annotation. In Proceedings of the ICMR.
Zurück zum Zitat Carneiro, G., Chan, A. B., Moreno, P. J., & Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 394–410.CrossRef Carneiro, G., Chan, A. B., Moreno, P. J., & Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 394–410.CrossRef
Zurück zum Zitat Chen, M., Zheng, A., & Weinberger, K. Q. (2013). Fast image tagging. In Proceedings of the ICML. Chen, M., Zheng, A., & Weinberger, K. Q. (2013). Fast image tagging. In Proceedings of the ICML.
Zurück zum Zitat Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., et al. (2014). DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of the ICML. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., et al. (2014). DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of the ICML.
Zurück zum Zitat Duygulu, P., Barnard, K., de Freitas, J. F., & Forsyth, D. A. (2002). Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proceedings of the ECCV (pp. 97–112). Duygulu, P., Barnard, K., de Freitas, J. F., & Forsyth, D. A. (2002). Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proceedings of the ECCV (pp. 97–112).
Zurück zum Zitat Feng, S. L., Manmatha, R., & Lavrenko, V. (2004). Multiple bernoulli relevance models for image and video annotation. In Proceedings of the CVPR (pp. 1002–1009). Feng, S. L., Manmatha, R., & Lavrenko, V. (2004). Multiple bernoulli relevance models for image and video annotation. In Proceedings of the CVPR (pp. 1002–1009).
Zurück zum Zitat Fu, H., Zhang, Q., & Qiu, G. (2012). Random forest for image annotation. In Proceedings of the ECCV (pp. 86–99). Fu, H., Zhang, Q., & Qiu, G. (2012). Random forest for image annotation. In Proceedings of the ECCV (pp. 86–99).
Zurück zum Zitat Grubinger, M. (2007). Analysis and evaluation of visual information systems performance. PhD thesis, Victoria University, Melbourne, Australia. Grubinger, M. (2007). Analysis and evaluation of visual information systems performance. PhD thesis, Victoria University, Melbourne, Australia.
Zurück zum Zitat Guillaumin, M., Mensink, T., Verbeek, J. J., & Schmid, C. (2009). Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Proceedings of the ICCV (pp. 309–316). Guillaumin, M., Mensink, T., Verbeek, J. J., & Schmid, C. (2009). Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Proceedings of the ICCV (pp. 309–316).
Zurück zum Zitat Gupta, A., Verma, Y., & Jawahar, C. V. (2012). Choosing linguistics over vision to describe images. In Proceedings of the AAAI. Gupta, A., Verma, Y., & Jawahar, C. V. (2012). Choosing linguistics over vision to describe images. In Proceedings of the AAAI.
Zurück zum Zitat Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12), 2639–2664.CrossRefMATH Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12), 2639–2664.CrossRefMATH
Zurück zum Zitat Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 321–377.CrossRefMATH Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 321–377.CrossRefMATH
Zurück zum Zitat Huiskes, M. J., & Lew, M. S. (2008). The MIR Flickr retrieval evaluation. In MIR. Huiskes, M. J., & Lew, M. S. (2008). The MIR Flickr retrieval evaluation. In MIR.
Zurück zum Zitat Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In Proceedings of the CVPR (pp. 3304–3311). Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In Proceedings of the CVPR (pp. 3304–3311).
Zurück zum Zitat Jeon, J., Lavrenko, V., & Manmatha, R. (2003). Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the ACM SIGIR (pp. 119–126). Jeon, J., Lavrenko, V., & Manmatha, R. (2003). Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the ACM SIGIR (pp. 119–126).
Zurück zum Zitat Jin, R., Wang, S., & Zhou, Z. H. (2009). Learning a distance metric from multi-instance multi-label data. In Proceedings of the CVPR (pp. 896–902). Jin, R., Wang, S., & Zhou, Z. H. (2009). Learning a distance metric from multi-instance multi-label data. In Proceedings of the CVPR (pp. 896–902).
Zurück zum Zitat Kalayeh, M. M., Idrees, H., & Shah, M. (2014). NMF-KNN: Image annotation using weighted multi-view non-negative matrix factorization. In Proceedings of the CVPR. Kalayeh, M. M., Idrees, H., & Shah, M. (2014). NMF-KNN: Image annotation using weighted multi-view non-negative matrix factorization. In Proceedings of the CVPR.
Zurück zum Zitat Lavrenko, V., Manmatha, R., & Jeon, J. (2003). A model for learning the semantics of pictures. In NIPS. Lavrenko, V., Manmatha, R., & Jeon, J. (2003). A model for learning the semantics of pictures. In NIPS.
Zurück zum Zitat Li, X., Snoek, C. G. M., & Worring, M. (2009). Learning social tag relevance by neighbor voting. IEEE Transactions on Multimedia, 11(7), 1310–1322.CrossRef Li, X., Snoek, C. G. M., & Worring, M. (2009). Learning social tag relevance by neighbor voting. IEEE Transactions on Multimedia, 11(7), 1310–1322.CrossRef
Zurück zum Zitat Liu, J., Li, M., Liu, Q., Lu, H., & Ma, S. (2009). Image annotation via graph learning. Pattern Recognition, 42(2), 218–228.CrossRefMATH Liu, J., Li, M., Liu, Q., Lu, H., & Ma, S. (2009). Image annotation via graph learning. Pattern Recognition, 42(2), 218–228.CrossRefMATH
Zurück zum Zitat Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef
Zurück zum Zitat Makadia, A., Pavlovic, V., & Kumar, S. (2008). A new baseline for image annotation. In Proceedings of the ECCV (pp. 316–329). Makadia, A., Pavlovic, V., & Kumar, S. (2008). A new baseline for image annotation. In Proceedings of the ECCV (pp. 316–329).
Zurück zum Zitat Makadia, A., Pavlovic, V., & Kumar, S. (2010). Baselines for image annotation. International Journal of Computer Vision, 90(1), 88–105.CrossRef Makadia, A., Pavlovic, V., & Kumar, S. (2010). Baselines for image annotation. International Journal of Computer Vision, 90(1), 88–105.CrossRef
Zurück zum Zitat Metzler, D., & Manmatha, R. (2004). An inference network approach to image retrieval. In Proceedings of the CIVR (pp. 42–50). Metzler, D., & Manmatha, R. (2004). An inference network approach to image retrieval. In Proceedings of the CIVR (pp. 42–50).
Zurück zum Zitat Moran, S., & Lavrenko, V. (2011). Optimal tag sets for automatic image annotation. In Proceedings of the BMVC (pp. 1.1–1.11). Moran, S., & Lavrenko, V. (2011). Optimal tag sets for automatic image annotation. In Proceedings of the BMVC (pp. 1.1–1.11).
Zurück zum Zitat Moran, S., & Lavrenko, V. (2014). A sparse kernel relevance model for automatic image annotation. International Journal of Multimedia Information Retrieval, 3(4), 209–229. Moran, S., & Lavrenko, V. (2014). A sparse kernel relevance model for automatic image annotation. International Journal of Multimedia Information Retrieval, 3(4), 209–229.
Zurück zum Zitat Mori, Y., Takahashi, H., & Oka, R. (1999). Image-to-word transformation based on dividing and vector quantizing images with words. In MISRM’99 first international workshop on multimedia intelligent storage and retrieval management. Mori, Y., Takahashi, H., & Oka, R. (1999). Image-to-word transformation based on dividing and vector quantizing images with words. In MISRM’99 first international workshop on multimedia intelligent storage and retrieval management.
Zurück zum Zitat Murthy, V. N., Can, E. F., & Manmatha, R. (2014). A hybrid model for automatic image annotation. In Proceedings of the ICMR. Murthy, V. N., Can, E. F., & Manmatha, R. (2014). A hybrid model for automatic image annotation. In Proceedings of the ICMR.
Zurück zum Zitat Nakayama, H. (2011). Linear distance metric learning for large-scale generic image recognition. PhD thesis, The University of Tokyo, Japan. Nakayama, H. (2011). Linear distance metric learning for large-scale generic image recognition. PhD thesis, The University of Tokyo, Japan.
Zurück zum Zitat Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.CrossRefMATH Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.CrossRefMATH
Zurück zum Zitat Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In Proceedings of the ECCV (pp. 143–156). Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In Proceedings of the ECCV (pp. 143–156).
Zurück zum Zitat Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal estimated sub-gradient solver for svm. In Proceedings of the ICML (pp. 807–814). Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal estimated sub-gradient solver for svm. In Proceedings of the ICML (pp. 807–814).
Zurück zum Zitat van de Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In Proceedings of the ECCV (pp. 334–348). van de Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In Proceedings of the ECCV (pp. 334–348).
Zurück zum Zitat Verbeek, J., Guillaumin, M., Mensink, T., & Schmid, C. (2010). Image Annotation with TagProp on the MIRFLICKR set. In MIR. Verbeek, J., Guillaumin, M., Mensink, T., & Schmid, C. (2010). Image Annotation with TagProp on the MIRFLICKR set. In MIR.
Zurück zum Zitat Verma, Y., & Jawahar, C. V. (2012). Image annotation using metric learning in semantic neighbourhoods. In Proceedings of the ECCV (pp. 836–849). Verma, Y., & Jawahar, C. V. (2012). Image annotation using metric learning in semantic neighbourhoods. In Proceedings of the ECCV (pp. 836–849).
Zurück zum Zitat Verma, Y., & Jawahar, C. V. (2013). Exploring SVM for image annotation in presence of confusing labels. In Proceedings of the BMVC. Verma, Y., & Jawahar, C. V. (2013). Exploring SVM for image annotation in presence of confusing labels. In Proceedings of the BMVC.
Zurück zum Zitat von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In SIGCHI conference on human factors in computing systems (pp. 319–326). von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In SIGCHI conference on human factors in computing systems (pp. 319–326).
Zurück zum Zitat Wang, C., Blei, D., & Fei-Fei, L. (2009). Simultaneous image classification and annotation. In Proceedings of the CVPR. Wang, C., Blei, D., & Fei-Fei, L. (2009). Simultaneous image classification and annotation. In Proceedings of the CVPR.
Zurück zum Zitat Wang, H., Huang, H., & Ding, C. H. Q. (2011). Image annotation using bi-relational graph of images and semantic labels. In Proceedings of the CVPR (pp. 793–800). Wang, H., Huang, H., & Ding, C. H. Q. (2011). Image annotation using bi-relational graph of images and semantic labels. In Proceedings of the CVPR (pp. 793–800).
Zurück zum Zitat Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10, 207–244.MATH Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10, 207–244.MATH
Zurück zum Zitat Xiang, Y., Zhou, X., Chua, T. S., & Ngo, C. W. (2009). A revisit of generative model for automatic image annotation using markov random fields. In Proceedings of the CVPR (pp. 1153–1160). Xiang, Y., Zhou, X., Chua, T. S., & Ngo, C. W. (2009). A revisit of generative model for automatic image annotation using markov random fields. In Proceedings of the CVPR (pp. 1153–1160).
Zurück zum Zitat Yavlinsky, A., Schofield, E., & Rüger, S. (2005). Automated image annotation using global features and robust nonparametric density estimation. In Proceedings of the CIVR (pp. 507–517). Yavlinsky, A., Schofield, E., & Rüger, S. (2005). Automated image annotation using global features and robust nonparametric density estimation. In Proceedings of the CIVR (pp. 507–517).
Zurück zum Zitat Zhang, S., Huang, J., Huang, Y., Yu, Y., Li, H., & Metaxas, D. N. (2010). Automatic image annotation using group sparsity. In Proceedings of the CVPR (pp. 3312–3319). Zhang, S., Huang, J., Huang, Y., Yu, Y., Li, H., & Metaxas, D. N. (2010). Automatic image annotation using group sparsity. In Proceedings of the CVPR (pp. 3312–3319).
Metadaten
Titel
Image Annotation by Propagating Labels from Semantic Neighbourhoods
verfasst von
Yashaswi Verma
C. V. Jawahar
Publikationsdatum
12.07.2016
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 1/2017
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-016-0927-0

Weitere Artikel der Ausgabe 1/2017

International Journal of Computer Vision 1/2017 Zur Ausgabe

Premium Partner