Skip to main content
Erschienen in: International Journal of Computer Vision 3/2013

01.09.2013

Optimization of Robust Loss Functions for Weakly-Labeled Image Taxonomies

verfasst von: Julian J. McAuley, Arnau Ramisa, Tibério S. Caetano

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The recently proposed ImageNet dataset consists of several million images, each annotated with a single object category. These annotations may be imperfect, in the sense that many images contain multiple objects belonging to the label vocabulary. In other words, we have a multi-label problem but the annotations include only a single label (which is not necessarily the most prominent). Such a setting motivates the use of a robust evaluation measure, which allows for a limited number of labels to be predicted and, so long as one of the predicted labels is correct, the overall prediction should be considered correct. This is indeed the type of evaluation measure used to assess algorithm performance in a recent competition on ImageNet data. Optimizing such types of performance measures presents several hurdles even with existing structured output learning methods. Indeed, many of the current state-of-the-art methods optimize the prediction of only a single output label, ignoring this ‘structure’ altogether. In this paper, we show how to directly optimize continuous surrogates of such performance measures using structured output learning techniques with latent variables. We use the output of existing binary classifiers as input features in a new learning stage which optimizes the structured loss corresponding to the robust performance measure. We present empirical evidence that this allows us to ‘boost’ the performance of binary classification on a variety of weakly-supervised labeling problems defined on image taxonomies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Note that in McAuley et al. (2011) we assumed that there was only a single groundtruth label y n for each image, as is the case for ImageNet. In the case of the MIR and ImageCLEF datasets there are a variable number (possibly zero) of groundtruth labels for each image, hence the change of notation.
 
2
There are countably many values for the loss but uncountably many values for the parameters, so there are large equivalence classes of parameters that correspond to precisely the same loss.
 
3
Note that somewhat simpler notation was used in McAuley et al. (2011), in which there was only a single output label y n , but otherwise the idea remains the same.
 
Literatur
Zurück zum Zitat Bart, E., Porteous, I., Perona, P., & Welling, M. (2008). Unsupervised learning of visual taxonomies. In IEEE conference on computer vision and pattern recognition. Bart, E., Porteous, I., Perona, P., & Welling, M. (2008). Unsupervised learning of visual taxonomies. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Binder, A., Müller, K.-R., & Kawanabe, M. (2011). On taxonomies for multi-class image categorization. International Journal of Computer Vision, 1–21. Binder, A., Müller, K.-R., & Kawanabe, M. (2011). On taxonomies for multi-class image categorization. International Journal of Computer Vision, 1–21.
Zurück zum Zitat Blaschko, M., Vedaldi, A., & Zisserman, A. (2010). Simultaneous object detection and ranking with weak supervision. In Advances in neural information processing systems. Blaschko, M., Vedaldi, A., & Zisserman, A. (2010). Simultaneous object detection and ranking with weak supervision. In Advances in neural information processing systems.
Zurück zum Zitat Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. In Advances in neural information processing systems. Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. In Advances in neural information processing systems.
Zurück zum Zitat Bucak, S. S., Jin, R., & Jain, A. K. (2011). Multi-label learning with incomplete class assignments. In IEEE conference on computer vision and pattern recognition. Bucak, S. S., Jin, R., & Jain, A. K. (2011). Multi-label learning with incomplete class assignments. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines. In Conference on information and knowledge management. Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines. In Conference on information and knowledge management.
Zurück zum Zitat Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV workshop on statistical learning in computer vision. Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV workshop on statistical learning in computer vision.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: a large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: a large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Deng, J., Berg, A. C., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us? In European conference on computer vision. Deng, J., Berg, A. C., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us? In European conference on computer vision.
Zurück zum Zitat Deng, J., Berg, A. C., & Fei-Fei, L. (2011). Hierarchical semantic indexing for large scale image retrieval. In IEEE conference on computer vision and pattern recognition. Deng, J., Berg, A. C., & Fei-Fei, L. (2011). Hierarchical semantic indexing for large scale image retrieval. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Deselaers, T., & Ferrari, V. (2011). Visual and semantic similarity in imagenet. In IEEE conference on computer vision and pattern recognition. Deselaers, T., & Ferrari, V. (2011). Visual and semantic similarity in imagenet. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Dimitrovski, I., Kocev, D., Loskovska, S., & Džeroski, S. (2010). Detection of visual concepts and annotation of images using ensembles of trees for hierarchical multi-label classification. In Recognizing patterns in signals, speech, images and videos (pp. 152–161). CrossRef Dimitrovski, I., Kocev, D., Loskovska, S., & Džeroski, S. (2010). Detection of visual concepts and annotation of images using ensembles of trees for hierarchical multi-label classification. In Recognizing patterns in signals, speech, images and videos (pp. 152–161). CrossRef
Zurück zum Zitat Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338. CrossRef Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338. CrossRef
Zurück zum Zitat Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Technical report 7694, California Institute of Technology. Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Technical report 7694, California Institute of Technology.
Zurück zum Zitat Guillaumin, M., Mensink, T., Verbeek, J., & Schmid, C. (2009). TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation. In International conference on computer vision. Guillaumin, M., Mensink, T., Verbeek, J., & Schmid, C. (2009). TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation. In International conference on computer vision.
Zurück zum Zitat Guillaumin, M., Verbeek, J., & Schmid, C. (2010). Multimodal semi-supervised learning for image classification. In IEEE conference on computer vision and pattern recognition. Guillaumin, M., Verbeek, J., & Schmid, C. (2010). Multimodal semi-supervised learning for image classification. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Huiskes, M. J., & Lew, M. S. (2008). The MIR Flickr retrieval evaluation. In International conference on multimedia information retrieval. Huiskes, M. J., & Lew, M. S. (2008). The MIR Flickr retrieval evaluation. In International conference on multimedia information retrieval.
Zurück zum Zitat Huiskes, M. J., Thomee, B., & Lew, M. S. (2010). New trends and ideas in visual concept detection: the MIR Flickr retrieval evaluation initiative. In International conference on multimedia information retrieval. Huiskes, M. J., Thomee, B., & Lew, M. S. (2010). New trends and ideas in visual concept detection: the MIR Flickr retrieval evaluation initiative. In International conference on multimedia information retrieval.
Zurück zum Zitat Jégou, H., Douze, M., & Schmid, C. (2010). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117–128. CrossRef Jégou, H., Douze, M., & Schmid, C. (2010). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117–128. CrossRef
Zurück zum Zitat Kawanabe, M., Binder, A., Muller, C., & Wojcikiewicz, W. (2011). Multi-modal visual concept classification of images via Markov random walk over tags. In IEEE workshop on applications of computer vision. Kawanabe, M., Binder, A., Muller, C., & Wojcikiewicz, W. (2011). Multi-modal visual concept classification of images via Markov random walk over tags. In IEEE workshop on applications of computer vision.
Zurück zum Zitat Kim, B. S., Park, J. Y., Mohan, A., Gilbert, A., & Savarese, S. (2011). Hierarchical classification of images by sparse approximation. In British machine vision conference. Kim, B. S., Park, J. Y., Mohan, A., Gilbert, A., & Savarese, S. (2011). Hierarchical classification of images by sparse approximation. In British machine vision conference.
Zurück zum Zitat Lampert, C. H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by Between-class attribute transfer. Lampert, C. H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by Between-class attribute transfer.
Zurück zum Zitat Lavrenko, V., Manmatha, R., & Jeon, J. (2003). A model for learning the semantics of pictures. In Advances in neural information processing systems. Lavrenko, V., Manmatha, R., & Jeon, J. (2003). A model for learning the semantics of pictures. In Advances in neural information processing systems.
Zurück zum Zitat Lin, Y., Lv, F., Zhu, S., Yang, M., Cour, T., & Yu, K. (2011). Large-scale image classification: fast feature extraction and SVM training. In IEEE conference on computer vision and pattern recognition. Lin, Y., Lv, F., Zhu, S., Yang, M., Cour, T., & Yu, K. (2011). Large-scale image classification: fast feature extraction and SVM training. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Marszałek, M., & Schmid, C. (2008). Constructing category hierarchies for visual recognition. In European conference in computer vision. Marszałek, M., & Schmid, C. (2008). Constructing category hierarchies for visual recognition. In European conference in computer vision.
Zurück zum Zitat McAuley, J., Ramisa, A., & Caetano, T. (2011). Optimization of robust loss functions for weakly-labeled image taxonomies: an ImageNet case study. In Energy minimization methods in computer vision and pattern recognition. McAuley, J., Ramisa, A., & Caetano, T. (2011). Optimization of robust loss functions for weakly-labeled image taxonomies: an ImageNet case study. In Energy minimization methods in computer vision and pattern recognition.
Zurück zum Zitat Mensink, T., Verbeek, J., & Csurka, G. (2011). Learning structured prediction models for interactive image labeling. In IEEE conference on computer vision and pattern recognition. Mensink, T., Verbeek, J., & Csurka, G. (2011). Learning structured prediction models for interactive image labeling. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38, 39–41. CrossRef Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38, 39–41. CrossRef
Zurück zum Zitat Moran, S., & Lavrenko, V. (2011). Optimal tag sets for automatic image annotation. In British machine vision conference. Moran, S., & Lavrenko, V. (2011). Optimal tag sets for automatic image annotation. In British machine vision conference.
Zurück zum Zitat Nowak, S., & Huiskes, M. J. (2010). New strategies for image annotation: overview of the photo annotation task at ImageCLEF. In CLEF (notebook Papers/LABs/Workshops). Nowak, S., & Huiskes, M. J. (2010). New strategies for image annotation: overview of the photo annotation task at ImageCLEF. In CLEF (notebook Papers/LABs/Workshops).
Zurück zum Zitat Nowak, S., Nagel, K., & Liebetrau, J. (2011). The CLEF 2011 photo annotation and concept-based retrieval tasks. Working Notes of CLEF. Nowak, S., Nagel, K., & Liebetrau, J. (2011). The CLEF 2011 photo annotation and concept-based retrieval tasks. Working Notes of CLEF.
Zurück zum Zitat Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the Fisher kernel for large-scale image classification. In European conference on computer vision. Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the Fisher kernel for large-scale image classification. In European conference on computer vision.
Zurück zum Zitat Russakovsky, O., & Fei-Fei, L. (2010). Attribute learning in large-scale datasets. In ECCV workshop on parts and attributes. Russakovsky, O., & Fei-Fei, L. (2010). Attribute learning in large-scale datasets. In ECCV workshop on parts and attributes.
Zurück zum Zitat Sánchez, J., & Perronnin, F. (2011). High-Dimensional signature compression for Large-Scale image classification. In IEEE conference on computer vision and pattern recognition. Sánchez, J., & Perronnin, F. (2011). High-Dimensional signature compression for Large-Scale image classification. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Setia, L., & Burkhardt, H. (2007). Learning taxonomies in large image databases. In ACM SIGIR workshop on multimedia information retrieval. Setia, L., & Burkhardt, H. (2007). Learning taxonomies in large image databases. In ACM SIGIR workshop on multimedia information retrieval.
Zurück zum Zitat Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A., Strehl, A., & Vishwanathan, V. (2009). Hash kernels. In Artificial intelligence and statistics. Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A., Strehl, A., & Vishwanathan, V. (2009). Hash kernels. In Artificial intelligence and statistics.
Zurück zum Zitat Sivic, J., Russell, B. C., Zisserman, A., Freeman, W. T., & Efros, A. A. (2008). Unsupervised discovery of visual object class hierarchies. In IEEE conference on computer vision and pattern recognition. Sivic, J., Russell, B. C., Zisserman, A., Freeman, W. T., & Efros, A. A. (2008). Unsupervised discovery of visual object class hierarchies. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Teo, C. H., Smola, A., Vishwanathan, S. V. N., & Le, Q. V. (2007). A scalable modular convex solver for regularized risk minimization. In Knowledge discovery and data mining. Teo, C. H., Smola, A., Vishwanathan, S. V. N., & Le, Q. V. (2007). A scalable modular convex solver for regularized risk minimization. In Knowledge discovery and data mining.
Zurück zum Zitat Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1958–1970. CrossRef Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1958–1970. CrossRef
Zurück zum Zitat Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484. MathSciNetMATH Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484. MathSciNetMATH
Zurück zum Zitat van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596. CrossRef van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596. CrossRef
Zurück zum Zitat Verbeek, J., Guillaumin, M., Mensink, T., & Schmid, C. (2010). Image annotation with TagProp on the MIR flickr set. In International conference on multimedia information retrieval. Verbeek, J., Guillaumin, M., Mensink, T., & Schmid, C. (2010). Image annotation with TagProp on the MIR flickr set. In International conference on multimedia information retrieval.
Zurück zum Zitat Wang, H., Huang, H., & Ding, C. (2011). Image annotation using bi-relational graph of images and semantic labels. In IEEE conference on computer vision and pattern recognition. Wang, H., Huang, H., & Ding, C. (2011). Image annotation using bi-relational graph of images and semantic labels. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Yu, C.-N., & Joachims, T. (2008). Training structural svms with kernels using sampled cuts. In Knowledge discovery and data mining. Yu, C.-N., & Joachims, T. (2008). Training structural svms with kernels using sampled cuts. In Knowledge discovery and data mining.
Zurück zum Zitat Yu, C.-N., & Joachims, T. (2009). Learning structural SVMs with latent variables. In International conference on machine learning. Yu, C.-N., & Joachims, T. (2009). Learning structural SVMs with latent variables. In International conference on machine learning.
Zurück zum Zitat Yuille, A., & Rangarajan, A. (2002). The concave-convex procedure (CCCP). In Advances in neural information processing systems. Yuille, A., & Rangarajan, A. (2002). The concave-convex procedure (CCCP). In Advances in neural information processing systems.
Metadaten
Titel
Optimization of Robust Loss Functions for Weakly-Labeled Image Taxonomies
verfasst von
Julian J. McAuley
Arnau Ramisa
Tibério S. Caetano
Publikationsdatum
01.09.2013
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2013
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-012-0561-4

Weitere Artikel der Ausgabe 3/2013

International Journal of Computer Vision 3/2013 Zur Ausgabe