Skip to main content
Erschienen in: Multimedia Systems 6/2014

01.11.2014 | Regular Paper

Relative image similarity learning with contextual information for Internet cross-media retrieval

verfasst von: Shuqiang Jiang, Xinhang Song, Qingming Huang

Erschienen in: Multimedia Systems | Ausgabe 6/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the fast explosive rate of the amount of image data on the Internet, how to efficiently utilize them in the cross-media scenario becomes an urgent problem. Images are usually accompanied with contextual textual information. These two heterogeneous modalities are mutually reinforcing to make the Internet content more informative. In most cases, visual information can be regarded as an enhanced content of the textual document. To make image-to-image similarity being more consistent with document-to-document similarity, this paper proposes a method to learn image similarities according to the relations of the accompanied textual documents. More specifically, instead of using the static quantitative relations, rank-based learning procedure by employing structural SVM is adopted in this paper, and the ranking structure is established by comparing the relative relations of textual information. The learning results are in more accordance with the human’s recognition. The proposed method in this paper can be used not only for the image-to-image retrieval, but also for cross-modality multimedia, where a query expansion framework is proposed to get more satisfactory results. Extensive experimental evaluations on large scale Internet dataset validate the performance of the proposed methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp. 524–531 (2005) Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp. 524–531 (2005)
2.
Zurück zum Zitat Li, L., Jiang, S., Huang, Q.: Learning hierarchical semantic description via mixed-norm regularization for image understanding. IEEE Trans. Multimedia 14(5), 1401–1413 (2012)CrossRef Li, L., Jiang, S., Huang, Q.: Learning hierarchical semantic description via mixed-norm regularization for image understanding. IEEE Trans. Multimedia 14(5), 1401–1413 (2012)CrossRef
3.
Zurück zum Zitat Tang, J., Zha, Z.-J., Tao, D., Chua, T.-S.: Semantic-gap oriented active learning for multi-label image annotation. IEEE Trans. Image Process. 21(4), 2354–2360 (2012)MathSciNetCrossRef Tang, J., Zha, Z.-J., Tao, D., Chua, T.-S.: Semantic-gap oriented active learning for multi-label image annotation. IEEE Trans. Image Process. 21(4), 2354–2360 (2012)MathSciNetCrossRef
4.
Zurück zum Zitat Wang, S., Huang, Q., Jiang, S., Tian, Q.: S3MKL: scalable semi-supervised multiple kernel learning for real world image applications. IEEE Trans. Multimedia 14(4), 1259–1274 (2012)CrossRef Wang, S., Huang, Q., Jiang, S., Tian, Q.: S3MKL: scalable semi-supervised multiple kernel learning for real world image applications. IEEE Trans. Multimedia 14(4), 1259–1274 (2012)CrossRef
5.
Zurück zum Zitat Wang, M., Hua, X., Hong, R., Tang, J., Qi, G., Song, Y.: Unified video annotation via multi-graph learning. IEEE Trans. Circ. Syst. Video Technol. 19(5), 733–746 (2009)CrossRef Wang, M., Hua, X., Hong, R., Tang, J., Qi, G., Song, Y.: Unified video annotation via multi-graph learning. IEEE Trans. Circ. Syst. Video Technol. 19(5), 733–746 (2009)CrossRef
6.
Zurück zum Zitat Jiang, S., Huang, Q., Ye, Q., Gao, W.: An effective method to detect and categorize digitized traditional Chinese paintings. Pattern Recogn. Lett. 27(7), 734–746 (2006)CrossRef Jiang, S., Huang, Q., Ye, Q., Gao, W.: An effective method to detect and categorize digitized traditional Chinese paintings. Pattern Recogn. Lett. 27(7), 734–746 (2006)CrossRef
7.
Zurück zum Zitat Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2169–2178 (2006) Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2169–2178 (2006)
8.
Zurück zum Zitat Tang, J., Yan, S., Hong, R., Qi, G.-J., Chua, T.-S.: Inferring semantic concepts from community-contributed images and noisy tags. In: Proceedings of ACM Multimedia, pp. 223–232 (2009) Tang, J., Yan, S., Hong, R., Qi, G.-J., Chua, T.-S.: Inferring semantic concepts from community-contributed images and noisy tags. In: Proceedings of ACM Multimedia, pp. 223–232 (2009)
9.
Zurück zum Zitat Li, X., Snoek, C.G.M., Worring, M.: Learning social tag relevance by neighbor voting. IEEE Trans. Multimedia 11(7), 1310–1322 (2009)CrossRef Li, X., Snoek, C.G.M., Worring, M.: Learning social tag relevance by neighbor voting. IEEE Trans. Multimedia 11(7), 1310–1322 (2009)CrossRef
10.
Zurück zum Zitat Tang, J., Hong, R., Yan, S., Chua, T.-S., Qi, G.-J., Jain, R.: Image annotation by knn-sparse graph-based label propagation over noisily-tagged web images. ACM Trans. Intell. Syst. Technol. 2, 2 (2011)CrossRef Tang, J., Hong, R., Yan, S., Chua, T.-S., Qi, G.-J., Jain, R.: Image annotation by knn-sparse graph-based label propagation over noisily-tagged web images. ACM Trans. Intell. Syst. Technol. 2, 2 (2011)CrossRef
11.
Zurück zum Zitat Liu, D., Hua, X., Yang, L., Wang, M., Zhang, H.: Tag ranking. In: Proceeding of the 17th International Conference on World Wide Web, ACM, New York, NY, USA, pp. 317–326 (2009) Liu, D., Hua, X., Yang, L., Wang, M., Zhang, H.: Tag ranking. In: Proceeding of the 17th International Conference on World Wide Web, ACM, New York, NY, USA, pp. 317–326 (2009)
12.
Zurück zum Zitat Zhu, G., Yan, S., Ma, Y.: Image tag refinement towards low-rank, content-tag prior and error sparsity. In: Proceedings of ACM Multimedia, pp. 461–470 (2010) Zhu, G., Yan, S., Ma, Y.: Image tag refinement towards low-rank, content-tag prior and error sparsity. In: Proceedings of ACM Multimedia, pp. 461–470 (2010)
13.
Zurück zum Zitat Cai, D., He, X., Li, Z., Ma, W.-Y., Wen, J.-R.: Hierarchical clustering of WWW image search results using visual, textual and link information. In: Proceedings of ACM Multimedia, pp. 952–959 (2004) Cai, D., He, X., Li, Z., Ma, W.-Y., Wen, J.-R.: Hierarchical clustering of WWW image search results using visual, textual and link information. In: Proceedings of ACM Multimedia, pp. 952–959 (2004)
14.
Zurück zum Zitat Gao, B., Liu, T.-Y., Qin, T., Zheng, X., Cheng, Q.-S., Ma, W.-Y.: Web image clustering by consistent utilization of visual features and surrounding texts. In: Proceedings of ACM Multimedia, pp. 112–121 (2005) Gao, B., Liu, T.-Y., Qin, T., Zheng, X., Cheng, Q.-S., Ma, W.-Y.: Web image clustering by consistent utilization of visual features and surrounding texts. In: Proceedings of ACM Multimedia, pp. 112–121 (2005)
15.
Zurück zum Zitat Rege, M., Dong, M., Hua, J.: Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering. In: Proceeding of the 17th International Conference on World Wide Web, ACM, New York, NY, USA, pp. 317–326 (2008) Rege, M., Dong, M., Hua, J.: Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering. In: Proceeding of the 17th International Conference on World Wide Web, ACM, New York, NY, USA, pp. 317–326 (2008)
16.
Zurück zum Zitat Jin, Y., Khan, L., Wang, L., Awad M.: Image annotations by combining multiple evidence and Wordnet. In: Proceedings of ACM Multimedia, pp. 706–715 (2008) Jin, Y., Khan, L., Wang, L., Awad M.: Image annotations by combining multiple evidence and Wordnet. In: Proceedings of ACM Multimedia, pp. 706–715 (2008)
17.
Zurück zum Zitat Wu, L., Hoi, S.C., Zhu, J., Jin, R., Yu, N.: Distance metric learning from uncertain side information with application to automated photo tagging. In: Proceedings of ACM Multimedia, pp. 135–144 (2009) Wu, L., Hoi, S.C., Zhu, J., Jin, R., Yu, N.: Distance metric learning from uncertain side information with application to automated photo tagging. In: Proceedings of ACM Multimedia, pp. 135–144 (2009)
18.
Zurück zum Zitat Wang, S., Jiang, S., Huang, Q., Tian, Q.: Multi-feature metric learning with knowledge transfer among semantics and social tagging. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2626–2633 (2012) Wang, S., Jiang, S., Huang, Q., Tian, Q.: Multi-feature metric learning with knowledge transfer among semantics and social tagging. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2626–2633 (2012)
19.
Zurück zum Zitat Wu, L., Hua, X.-S., Yu, N., Ma, W.-Y., Li, S.: Flickr distance. In: Proceedings of ACM Multimedia, pp. 31–40 (2008) Wu, L., Hua, X.-S., Yu, N., Ma, W.-Y., Li, S.: Flickr distance. In: Proceedings of ACM Multimedia, pp. 31–40 (2008)
20.
Zurück zum Zitat Hoi, S.C.H., Liu, W., Lyu, M.R., Ma, W.-Y.: Learning distance metrics with contextual constraints for image retrieval. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2072–2078 (2006) Hoi, S.C.H., Liu, W., Lyu, M.R., Ma, W.-Y.: Learning distance metrics with contextual constraints for image retrieval. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2072–2078 (2006)
21.
Zurück zum Zitat Hwang, S.J., Grauman, K., Sha, F.: Learning a tree of metrics with disjoint visual features. In: Proceedings of the Conference on Advances in Neural Information Processing Systems, NIPS (2011) Hwang, S.J., Grauman, K., Sha, F.: Learning a tree of metrics with disjoint visual features. In: Proceedings of the Conference on Advances in Neural Information Processing Systems, NIPS (2011)
22.
Zurück zum Zitat Wu, P., Hoi, S.C.H., Zhao, P., He, Y.: Mining social images with distance metric learning for automated image tagging. In: WSDM, pp. 197–206 (2011) Wu, P., Hoi, S.C.H., Zhao, P., He, Y.: Mining social images with distance metric learning for automated image tagging. In: WSDM, pp. 197–206 (2011)
23.
Zurück zum Zitat Verma, N., Mahajan, D., Sellamanickam, S., Nair, V.: Learning hierarchical similarity metrics. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2280–2287 (2012) Verma, N., Mahajan, D., Sellamanickam, S., Nair, V.: Learning hierarchical similarity metrics. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2280–2287 (2012)
24.
Zurück zum Zitat Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)MathSciNetMATH Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)MathSciNetMATH
25.
Zurück zum Zitat McFee, B., Lanckriet, G.: Metric learning to rank. In: International Conference on Machine Learning, Haifa, Israel (2010) McFee, B., Lanckriet, G.: Metric learning to rank. In: International Conference on Machine Learning, Haifa, Israel (2010)
26.
Zurück zum Zitat Wang, X.-J., Zhang, L., Li, X., Ma, W.-Y.: Annotating images by mining image search results. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1919–1932 (2008)CrossRef Wang, X.-J., Zhang, L., Li, X., Ma, W.-Y.: Annotating images by mining image search results. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1919–1932 (2008)CrossRef
27.
Zurück zum Zitat Harchaoui, Z., Douze, M., Paulin, M., Dudik, M., Malick, J.: Large-scale image classification with trace-norm regularization. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp. 3386–3393 (2012) Harchaoui, Z., Douze, M., Paulin, M., Dudik, M., Malick, J.: Large-scale image classification with trace-norm regularization. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp. 3386–3393 (2012)
28.
Zurück zum Zitat Perronnin, F., Akata, Z., Harchaoui, Z., Schmid, C.: Towards good practice in large-scale learning for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3482–3489 (2012) Perronnin, F., Akata, Z., Harchaoui, Z., Schmid, C.: Towards good practice in large-scale learning for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3482–3489 (2012)
29.
Zurück zum Zitat Zhang, H.J., Su, Z.: Improving CBIR by semantic propagation and cross modality query expansion. In: Proceedings of the international workshop on MultiMedia Content-Based Indexing and Retrieval (MM-CBIR’01), September, pp. 83–86 (2001) Zhang, H.J., Su, Z.: Improving CBIR by semantic propagation and cross modality query expansion. In: Proceedings of the international workshop on MultiMedia Content-Based Indexing and Retrieval (MM-CBIR’01), September, pp. 83–86 (2001)
30.
Zurück zum Zitat Jia, Y., Salmann, M., Darrell, T.: Learning cross-modality similarity for multinomial data. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2407–2414 (2011) Jia, Y., Salmann, M., Darrell, T.: Learning cross-modality similarity for multinomial data. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2407–2414 (2011)
31.
Zurück zum Zitat Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)MATH Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)MATH
32.
Zurück zum Zitat Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood components analysis. In: Proceedings of the Conference on Advances in Neural Information Processing Systems (2005) Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood components analysis. In: Proceedings of the Conference on Advances in Neural Information Processing Systems (2005)
33.
Zurück zum Zitat Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Proceedings of the Conference on Advances in Neural Information Processing Systems (2009) Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Proceedings of the Conference on Advances in Neural Information Processing Systems (2009)
34.
Zurück zum Zitat Agarwal, S., Wills, J., Cayton, L., Lanckriet, G., Kriegman, D., Belongi, S.: Generalized non-metric multi-dimensional scaling. In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (2007) Agarwal, S., Wills, J., Cayton, L., Lanckriet, G., Kriegman, D., Belongi, S.: Generalized non-metric multi-dimensional scaling. In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (2007)
35.
Zurück zum Zitat McFee, B., Lanckriet, G.R.G.: Learning multi-modal similarity. J. Mach. Learn. Res. (JMLR), February, pp. 491–523 (2011) McFee, B., Lanckriet, G.R.G.: Learning multi-modal similarity. J. Mach. Learn. Res. (JMLR), February, pp. 491–523 (2011)
36.
Zurück zum Zitat Lee, J.-E., Jin, R., Jain, A.K.: Rank-based distance metric learning: an application to image retrieval. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2009) Lee, J.-E., Jin, R., Jain, A.K.: Rank-based distance metric learning: an application to image retrieval. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2009)
37.
Zurück zum Zitat Thorsten, J., Finley, T., John Yu C.-N.: Cutting-plane training of structural SVMs. Mach. Learn. 77(1):27–59 (2009). ISSN 0885-6125 Thorsten, J., Finley, T., John Yu C.-N.: Cutting-plane training of structural SVMs. Mach. Learn. 77(1):27–59 (2009). ISSN 0885-6125
38.
Zurück zum Zitat Crammer, K., Singer, Y.: On the algorithmic implementation of multi-class kernel-based vector machines. Mach. Learn. Res. 2, 265–292 (2001) Crammer, K., Singer, Y.: On the algorithmic implementation of multi-class kernel-based vector machines. Mach. Learn. Res. 2, 265–292 (2001)
39.
Zurück zum Zitat Joachims, T.: A support vector method for multivariate performance measures. In: International Conference on Machine Learning, pp. 377–384 (2005) Joachims, T.: A support vector method for multivariate performance measures. In: International Conference on Machine Learning, pp. 377–384 (2005)
40.
Zurück zum Zitat Yue, Y., Finley, T., Radlinski, F., Joachims, T.: A support vector method for optimizing average precision. In: Proceedings of acm special interest group on information retrieval conference, pp. 271–278 (2007) Yue, Y., Finley, T., Radlinski, F., Joachims, T.: A support vector method for optimizing average precision. In: Proceedings of acm special interest group on information retrieval conference, pp. 271–278 (2007)
41.
Zurück zum Zitat Chakrabarti, S., Khanna, R., Sawant, U., Bhattacharyya, C.: Structured learning for non smooth ranking losses. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, USA, pp. 88–96 (2008) Chakrabarti, S., Khanna, R., Sawant, U., Bhattacharyya, C.: Structured learning for non smooth ranking losses. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, USA, pp. 88–96 (2008)
43.
Zurück zum Zitat Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)CrossRef Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)CrossRef
Metadaten
Titel
Relative image similarity learning with contextual information for Internet cross-media retrieval
verfasst von
Shuqiang Jiang
Xinhang Song
Qingming Huang
Publikationsdatum
01.11.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
Multimedia Systems / Ausgabe 6/2014
Print ISSN: 0942-4962
Elektronische ISSN: 1432-1882
DOI
https://doi.org/10.1007/s00530-012-0299-4

Weitere Artikel der Ausgabe 6/2014

Multimedia Systems 6/2014 Zur Ausgabe

Neuer Inhalt