Skip to main content
Top
Published in: International Journal of Computer Vision 2/2014

01-01-2014

A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics

Authors: Yunchao Gong, Qifa Ke, Michael Isard, Svetlana Lazebnik

Published in: International Journal of Computer Vision | Issue 2/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper investigates the problem of modeling Internet images and associated text or tags for tasks such as image-to-image search, tag-to-image search, and image-to-tag search (image annotation). We start with canonical correlation analysis (CCA), a popular and successful approach for mapping visual and textual features to the same latent space, and incorporate a third view capturing high-level image semantics, represented either by a single category or multiple non-mutually-exclusive concepts. We present two ways to train the three-view embedding: supervised, with the third view coming from ground-truth labels or search keywords; and unsupervised, with semantic themes automatically obtained by clustering the tags. To ensure high accuracy for retrieval tasks while keeping the learning process scalable, we combine multiple strong visual features and use explicit nonlinear kernel mappings to efficiently approximate kernel CCA. To perform retrieval, we use a specially designed similarity function in the embedded space, which substantially outperforms the Euclidean distance. The resulting system produces compelling qualitative results and outperforms a number of two-view baselines on retrieval tasks on three large-scale Internet image datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
It can be shown that CCA with labels as one of the views is equivalent to Linear Discriminant Analysis (LDA) (Bartlett 1938).
 
Literature
go back to reference Ando, R. K., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817–1853.MATHMathSciNet Ando, R. K., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817–1853.MATHMathSciNet
go back to reference Bach, F. R., & Jordan, M. I. (2002). Kernel independent component analysis. Journal of Machine Learning Research, 3, 1–48.MathSciNet Bach, F. R., & Jordan, M. I. (2002). Kernel independent component analysis. Journal of Machine Learning Research, 3, 1–48.MathSciNet
go back to reference Barnard, K., & Forsyth, D. (2001). Learning the semantics of words and pictures. In ICCV (Vol. 2, pp. 408–415). Barnard, K., & Forsyth, D. (2001). Learning the semantics of words and pictures. In ICCV (Vol. 2, pp. 408–415).
go back to reference Bartlett, M. S. (1938). Further aspects of the theory of multiple regression. Mathematical Proceedings of the Cambridge Philosophical Society, 34(1), 33–40.CrossRef Bartlett, M. S. (1938). Further aspects of the theory of multiple regression. Mathematical Proceedings of the Cambridge Philosophical Society, 34(1), 33–40.CrossRef
go back to reference Berg, T., & Forsyth, D. (2006). Animals on the web. In CVPR. Berg, T., & Forsyth, D. (2006). Animals on the web. In CVPR.
go back to reference Berg, T. L., & Berg, A. C. (2009). Finding iconic images. In Second workshop on Internet vision at CVPR. Berg, T. L., & Berg, A. C. (2009). Finding iconic images. In Second workshop on Internet vision at CVPR.
go back to reference Blaschko, M., & Lampert, C. (2008). Correlational spectral clustering. In CVPR. Blaschko, M., & Lampert, C. (2008). Correlational spectral clustering. In CVPR.
go back to reference Blei, D., & Jordan, M. (2003). Modeling annotated data. In ACM SIGIR (pp. 127–134). Blei, D., & Jordan, M. (2003). Modeling annotated data. In ACM SIGIR (pp. 127–134).
go back to reference Blei, D., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.MATH Blei, D., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.MATH
go back to reference Carneiro, G., Chan, A., Moreno, P., & Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. In PAMI. Carneiro, G., Chan, A., Moreno, P., & Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. In PAMI.
go back to reference Chapelle, O., Weston, J., & Scholkopf, B. (2003). Cluster kernels for semi-supervised learning. In NIPS. Chapelle, O., Weston, J., & Scholkopf, B. (2003). Cluster kernels for semi-supervised learning. In NIPS.
go back to reference Chen, N., Zhu, J., Sun, F., & Xing, E. P. (2012). Large-margin predictive latent subspace learning for multi-view data analysis. In PAMI. Chen, N., Zhu, J., Sun, F., & Xing, E. P. (2012). Large-margin predictive latent subspace learning for multi-view data analysis. In PAMI.
go back to reference Chen, X., Yuan, X.-T., Chen, Q., Yan, S., & Chua, T.-S. (2011). Multi-label visual classification with label exclusive context. In ICCV. Chen, X., Yuan, X.-T., Chen, Q., Yan, S., & Chua, T.-S. (2011). Multi-label visual classification with label exclusive context. In ICCV.
go back to reference Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y.-T. (2009). NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of ACM conference on image and video retrieval (CIVR’09), Santorini, Greece. Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y.-T. (2009). NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of ACM conference on image and video retrieval (CIVR’09), Santorini, Greece.
go back to reference Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR.
go back to reference Datta, R., Joshi, D., Li, J., & Wang, J. Z. (2008). Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 40(2), 1–60.CrossRef Datta, R., Joshi, D., Li, J., & Wang, J. Z. (2008). Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 40(2), 1–60.CrossRef
go back to reference Deng, J., Dong, W., Socher, R., Li, L., & Li, K. (2009). ImageNet: A large-scale hierarchical image database. In CVPR. Deng, J., Dong, W., Socher, R., Li, L., & Li, K. (2009). ImageNet: A large-scale hierarchical image database. In CVPR.
go back to reference Duygulu, P., Barnard, K., de Freitas, N., & Forsyth, D. (2002). Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In ECCV. Duygulu, P., Barnard, K., de Freitas, N., & Forsyth, D. (2002). Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In ECCV.
go back to reference Fan, J., Shen, Y., Zhou, N., & Gao, Y. (2010). Harvesting large-scale weakly-tagged image databases from the web. In CVPR (pp. 802–809). Fan, J., Shen, Y., Zhou, N., & Gao, Y. (2010). Harvesting large-scale weakly-tagged image databases from the web. In CVPR (pp. 802–809).
go back to reference Farhadi, A., Hejrati, M., Sadeghi, A., Young, P., Rashtchian, C., Hockenmaier, J., & Forsyth, D. A. (2010). Every picture tells a story: Generating sentences for images. In ECCV. Farhadi, A., Hejrati, M., Sadeghi, A., Young, P., Rashtchian, C., Hockenmaier, J., & Forsyth, D. A. (2010). Every picture tells a story: Generating sentences for images. In ECCV.
go back to reference Foster, D. P., Johnson, R., Kakade, S. M., & Zhang, T. (2010). Multi-view dimensionality reduction via canonical correlation analysis. Tech Report. Rutgers University. Foster, D. P., Johnson, R., Kakade, S. M., & Zhang, T. (2010). Multi-view dimensionality reduction via canonical correlation analysis. Tech Report. Rutgers University.
go back to reference Frankel, C., Swain, M. J., & Athitsos, V. (1997). Webseer: An image search engine for the World Wide Web. In CVPR. Frankel, C., Swain, M. J., & Athitsos, V. (1997). Webseer: An image search engine for the World Wide Web. In CVPR.
go back to reference Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In ICCV. Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In ICCV.
go back to reference Gong, Y., & Lazebnik, S. (2011). Iterative quantization: An procrustean approach to learning binary codes. In CVPR. Gong, Y., & Lazebnik, S. (2011). Iterative quantization: An procrustean approach to learning binary codes. In CVPR.
go back to reference Globerson, A., & Roweis, S. (2005). Metric Learning by collapsing classes. In NIPS. Globerson, A., & Roweis, S. (2005). Metric Learning by collapsing classes. In NIPS.
go back to reference Goldberger, J., Roweis, S., & Hinton, G. (2004). Neighbourhood components analysis. In NIPS. Goldberger, J., Roweis, S., & Hinton, G. (2004). Neighbourhood components analysis. In NIPS.
go back to reference Grangier, D., & Bengio, S. (2008). A discriminative kernel-based model to rank images from text queries. In PAMI. Grangier, D., & Bengio, S. (2008). A discriminative kernel-based model to rank images from text queries. In PAMI.
go back to reference Grubinger, M., Clough, P. D., Müller, H., & Deselaers, T. (2006). The IAPR TC-12 benchmark—A new evaluation resource for visual information systems. In Proceedings of the international workshop OntoImage’2006 language resources for content-based image retrieval (pp. 13–23). Grubinger, M., Clough, P. D., Müller, H., & Deselaers, T. (2006). The IAPR TC-12 benchmark—A new evaluation resource for visual information systems. In Proceedings of the international workshop OntoImage’2006 language resources for content-based image retrieval (pp. 13–23).
go back to reference Gordo, A., Rodriguez-Serrano, J., Perronnin, F., & Valveny, E. (2012). Leveraging category-level labels for instance-level image retrieval. In CVPR. Gordo, A., Rodriguez-Serrano, J., Perronnin, F., & Valveny, E. (2012). Leveraging category-level labels for instance-level image retrieval. In CVPR.
go back to reference Guillaumin, M., Mensink, T., Verbeek, J., & Schmid, C. (2009). TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation. In ICCV. Guillaumin, M., Mensink, T., Verbeek, J., & Schmid, C. (2009). TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation. In ICCV.
go back to reference Guillaumin, M., Verbeek, J., & Schmid, C. (2010). Multimodal semi-supervised learning for image classification. In CVPR. Guillaumin, M., Verbeek, J., & Schmid, C. (2010). Multimodal semi-supervised learning for image classification. In CVPR.
go back to reference Hardoon, D., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: an overview with application. Neural Computation, 16(12), 2639–2664.CrossRefMATH Hardoon, D., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: an overview with application. Neural Computation, 16(12), 2639–2664.CrossRefMATH
go back to reference Hofmann, T. (1999). Probabilistic latent semantic indexing. In SIGIR. Hofmann, T. (1999). Probabilistic latent semantic indexing. In SIGIR.
go back to reference Hotelling, H. (1936). Relations between two sets of variables. Biometrika, 28, 312–377. Hotelling, H. (1936). Relations between two sets of variables. Biometrika, 28, 312–377.
go back to reference Hsu, D., Kakade, S., Langford, J., & Zhang, T. (2009). Multi-label prediction via compressed sensing. In NIPS. Hsu, D., Kakade, S., Langford, J., & Zhang, T. (2009). Multi-label prediction via compressed sensing. In NIPS.
go back to reference Hwang, S. J., & Grauman, K. (2010). Accounting for the relative importance of objects in image retrieval. In BMVC. Hwang, S. J., & Grauman, K. (2010). Accounting for the relative importance of objects in image retrieval. In BMVC.
go back to reference Hwang, S. J., & Grauman, K. (2011). Learning the relative importance of objects from tagged images for retrieval and cross-modal search. In IJCV. Hwang, S. J., & Grauman, K. (2011). Learning the relative importance of objects from tagged images for retrieval and cross-modal search. In IJCV.
go back to reference Krapac, J., Allan, M., Verbeek, J., & Jurie, F. (2010). Improving web-image search results using query-relative classifiers. In CVPR. Krapac, J., Allan, M., Verbeek, J., & Jurie, F. (2010). Improving web-image search results using query-relative classifiers. In CVPR.
go back to reference Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Tech Report. University of Toronto. Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Tech Report. University of Toronto.
go back to reference Kulkarni, G., Premraj, V., Dhar, S., Li, S., Choi, Y., Berg, A. C., & Berg, T. L. (2011). Babytalk: Understanding and generating simple image descriptions. In CVPR. Kulkarni, G., Premraj, V., Dhar, S., Li, S., Choi, Y., Berg, A. C., & Berg, T. L. (2011). Babytalk: Understanding and generating simple image descriptions. In CVPR.
go back to reference Larsen, R. M. (1998). Lanczos bidiagonalization with partial reorthogonalization. Technical report, Department of Computer Science, Aarhus University Larsen, R. M. (1998). Lanczos bidiagonalization with partial reorthogonalization. Technical report, Department of Computer Science, Aarhus University
go back to reference Lavrenko, V., Manmatha, R., & Jeon, J. (2003). A model for learning the semantics of pictures. In NIPS. Lavrenko, V., Manmatha, R., & Jeon, J. (2003). A model for learning the semantics of pictures. In NIPS.
go back to reference Lazebnik, S., Schmid, S., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR. Lazebnik, S., Schmid, S., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR.
go back to reference Li, J., & Wang, J. (2008). Real-time computerized annotation of pictures. In PAMI. Li, J., & Wang, J. (2008). Real-time computerized annotation of pictures. In PAMI.
go back to reference Liu, C., Yuen, J., & Torralba, A. (2010). Sift flow: Dense correspondence across difference scenes. In PAMI. Liu, C., Yuen, J., & Torralba, A. (2010). Sift flow: Dense correspondence across difference scenes. In PAMI.
go back to reference Liu, Y., Xu, D., Tsang, I., & Luo, J. (2009). Using large-scale web data to facilitate textual query based retrieval of consumer photos. In ACM MM. Liu, Y., Xu, D., Tsang, I., & Luo, J. (2009). Using large-scale web data to facilitate textual query based retrieval of consumer photos. In ACM MM.
go back to reference Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. In IJCV. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. In IJCV.
go back to reference Lucchi, A., & Weston, J. (2012). Joint image and word sense discrimination for image retrieval. In ECCV. Lucchi, A., & Weston, J. (2012). Joint image and word sense discrimination for image retrieval. In ECCV.
go back to reference Maji, S., & Berg, A. (2009). Max-margin additive classifiers for detection. In CVPR. Maji, S., & Berg, A. (2009). Max-margin additive classifiers for detection. In CVPR.
go back to reference Makadia, A., Pavlovic, V., & Kumar, S. (2008). A new baseline for image annotation. In ECCV. Makadia, A., Pavlovic, V., & Kumar, S. (2008). A new baseline for image annotation. In ECCV.
go back to reference Mensink, T., Verbeek, J., Csurka, G., & Perronnin, F. (2012). Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In ECCV. Mensink, T., Verbeek, J., Csurka, G., & Perronnin, F. (2012). Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In ECCV.
go back to reference Monay, F., & Gatica-Perez, D. (2004). PLSA-based image auto-annotation: Constraining the latent space. In ACM Multimedia. Monay, F., & Gatica-Perez, D. (2004). PLSA-based image auto-annotation: Constraining the latent space. In ACM Multimedia.
go back to reference Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In NIPS. Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In NIPS.
go back to reference Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. In IJCV. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. In IJCV.
go back to reference Ordonez, V., Kulkarni, G., & Berg, T. L. (2011). Im2text: Describing images using 1 million captioned photographs. In NIPS. Ordonez, V., Kulkarni, G., & Berg, T. L. (2011). Im2text: Describing images using 1 million captioned photographs. In NIPS.
go back to reference Perronnin, F., Sanchez, J., & Liu, Y. (2010). Large-scale image categorization with explicit data embedding. In CVPR. Perronnin, F., Sanchez, J., & Liu, Y. (2010). Large-scale image categorization with explicit data embedding. In CVPR.
go back to reference Quadrianto, N., & Lampert, C. H. (2011). Learning multi-view neighborhood preserving projections. In ICML. Quadrianto, N., & Lampert, C. H. (2011). Learning multi-view neighborhood preserving projections. In ICML.
go back to reference Quattoni, A., Collins, M., & Darrell, T. (2007). Learning visual representations using images with captions. In CVPR. Quattoni, A., Collins, M., & Darrell, T. (2007). Learning visual representations using images with captions. In CVPR.
go back to reference Raguram, R., & Lazebnik, S. (2008). Computing iconic summaries for general visual concepts. In First workshop on Internet vision at CVPR. Raguram, R., & Lazebnik, S. (2008). Computing iconic summaries for general visual concepts. In First workshop on Internet vision at CVPR.
go back to reference Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. In NIPS. Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. In NIPS.
go back to reference Rai, P., & Daumé, H. (2009). Multi-label prediction via sparse infinite CCA. In NIPS. Rai, P., & Daumé, H. (2009). Multi-label prediction via sparse infinite CCA. In NIPS.
go back to reference Rasiwasia, N., & Vasconcelos, N. (2007). Bridging the gap: Query by semantic example. IEEE Transactions on Multimedia, 9(5), 923–938.CrossRef Rasiwasia, N., & Vasconcelos, N. (2007). Bridging the gap: Query by semantic example. IEEE Transactions on Multimedia, 9(5), 923–938.CrossRef
go back to reference Rasiwasia, N., Pereira, J. C., Coviello, E., Doyle, G., Lanckriet, G., Levy, R., et al. (2010). A new approach to cross-modal multimedia retrieval. In ACM MM. Rasiwasia, N., Pereira, J. C., Coviello, E., Doyle, G., Lanckriet, G., Levy, R., et al. (2010). A new approach to cross-modal multimedia retrieval. In ACM MM.
go back to reference Scholkopf, B., Smola, A., & Muller, K.-R. (1997). Kernel principal component analysis. In ICANN. Scholkopf, B., Smola, A., & Muller, K.-R. (1997). Kernel principal component analysis. In ICANN.
go back to reference Schroff, F., Criminisi, A., & Zisserman, A. (2007). Harvesting image databases from the Web. In ICCV. Schroff, F., Criminisi, A., & Zisserman, A. (2007). Harvesting image databases from the Web. In ICCV.
go back to reference Sharma, A., Kumar, A., Daumé, H., & Jacobs, D. (2012). Generalized multiview analysis: A discriminative latent space. In CVPR. Sharma, A., Kumar, A., Daumé, H., & Jacobs, D. (2012). Generalized multiview analysis: A discriminative latent space. In CVPR.
go back to reference Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. In PAMI. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. In PAMI.
go back to reference Smeulders, A. W., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380.CrossRef Smeulders, A. W., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380.CrossRef
go back to reference Tighe, J., & Lazebnik, S. (2010). Superparsing: Scalable nonparametric image parsing with superpixels. In ECCV. Tighe, J., & Lazebnik, S. (2010). Superparsing: Scalable nonparametric image parsing with superpixels. In ECCV.
go back to reference Udupa, R., & Khapra, M. (2010). Improving the multilingual user experience of Wikipedia using cross-language name search. In NAACL. Udupa, R., & Khapra, M. (2010). Improving the multilingual user experience of Wikipedia using cross-language name search. In NAACL.
go back to reference van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. In PAMI. van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. In PAMI.
go back to reference Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In CVPR. Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In CVPR.
go back to reference Verma, Y., & Jawahar, C. V. (2012). Image annotation using metric learning in semantic neighbourhoods. In ECCV. Verma, Y., & Jawahar, C. V. (2012). Image annotation using metric learning in semantic neighbourhoods. In ECCV.
go back to reference Vinokourov, A., Shawe-Taylor, J., & Cristianini, N. (2002). Inferring a semantic representation of text via cross-language correlation analysis. In NIPS. Vinokourov, A., Shawe-Taylor, J., & Cristianini, N. (2002). Inferring a semantic representation of text via cross-language correlation analysis. In NIPS.
go back to reference von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In ACM SIGCHI. von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In ACM SIGCHI.
go back to reference Wang, C., Blei, D., & Li, F. (2009a). Simultaneous image classification and annotation. In CVPR (pp. 1903–1910). Wang, C., Blei, D., & Li, F. (2009a). Simultaneous image classification and annotation. In CVPR (pp. 1903–1910).
go back to reference Wang, G., Hoiem, D., & Forsyth, D. (2009b). Building text features for object image classification. In CVPR. Wang, G., Hoiem, D., & Forsyth, D. (2009b). Building text features for object image classification. In CVPR.
go back to reference Wang, G., Hoiem, D., & Forsyth, D. (2009c). Learning image similarity from Flickr groups using stochastic intersection kernel machines. In ICCV. Wang, G., Hoiem, D., & Forsyth, D. (2009c). Learning image similarity from Flickr groups using stochastic intersection kernel machines. In ICCV.
go back to reference Wang, X.-J., Zhang, L., Li, X., & Ma, W.-Y. (2008). Annotating images by mining image search results. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1919–1932. Wang, X.-J., Zhang, L., Li, X., & Ma, W.-Y. (2008). Annotating images by mining image search results. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1919–1932.
go back to reference Weston, J., Bengio, S., & Usunier, N. (2011). Wsabie: Scaling up to large vocabulary image annotation. In IJCAI. Weston, J., Bengio, S., & Usunier, N. (2011). Wsabie: Scaling up to large vocabulary image annotation. In IJCAI.
go back to reference Wei, X., & Croft, W. B. (2006). LDA-based document models for ad-hoc retrieval. In SIGIR. Wei, X., & Croft, W. B. (2006). LDA-based document models for ad-hoc retrieval. In SIGIR.
go back to reference Weinberger, K., Blitzer, J., & Saul, L. (2005). Distance metric learning for large margin nearest neighbor classification. In NIPS. Weinberger, K., Blitzer, J., & Saul, L. (2005). Distance metric learning for large margin nearest neighbor classification. In NIPS.
go back to reference Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba, A. (2010). SUN database: Large-scale scene recognition from abbey to zoo. In CVPR. Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba, A. (2010). SUN database: Large-scale scene recognition from abbey to zoo. In CVPR.
go back to reference Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In SIGIR. Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In SIGIR.
go back to reference Yakhnenko, O., & Honavar, V. (2009). Multiple label prediction for image annotation with multiple kernel correlation models. In Workshop on visual context learning (in conjunction with CVPR). Yakhnenko, O., & Honavar, V. (2009). Multiple label prediction for image annotation with multiple kernel correlation models. In Workshop on visual context learning (in conjunction with CVPR).
go back to reference Zhang, Y., & Schneider, J. (2011). Multi-label output codes using canonical correlation analysis. In AISTATS. Zhang, Y., & Schneider, J. (2011). Multi-label output codes using canonical correlation analysis. In AISTATS.
go back to reference Zhu, S., Ji, X., Xu, W., & Gong, Y. (2005). Multi-labelled classication using maximum entropy method. In ACM SIGIR. Zhu, S., Ji, X., Xu, W., & Gong, Y. (2005). Multi-labelled classication using maximum entropy method. In ACM SIGIR.
Metadata
Title
A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics
Authors
Yunchao Gong
Qifa Ke
Michael Isard
Svetlana Lazebnik
Publication date
01-01-2014
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 2/2014
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-013-0658-4

Other articles of this Issue 2/2014

International Journal of Computer Vision 2/2014 Go to the issue

Premium Partner