nach oben

Erschienen in:

2017 | OriginalPaper | Buchkapitel

Efficient Cross-modal Retrieval via Discriminative Deep Correspondence Model

verfasst von : Zhikai Hu, Xin Liu, An Li, Bineng Zhong, Wentao Fan, Jixiang Du

Erschienen in: Computer Vision

Verlag: Springer Singapore

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Cross-modal retrieval has recently drawn much attention due to the widespread existence of multi-modal data, and it generally involves two challenges: how to model the correlations and how to utilize the class label information to eliminate the heterogeneity between different modalities. Most previous works mainly focus on solving the first challenge and often ignore the second one. In this paper, we propose a discriminative deep correspondence model to deal with both problems. By taking the class label information into consideration, our proposed model attempts to seamlessly combine the correspondence autoencoder (Corr-AE) and supervised correspondence neural networks (Super-Corr-NN) for cross-modal matching. The former model can learn the correspondence representations of data from different modalities, while the latter model is designed to discriminatively reduce the semantic gap between the low-level features and high-level descriptions. The extensive experiments tested on three public datasets demonstrate the effectiveness of the proposed approach in comparison with the state-of-the-art competing methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Deep Temporal Architecture for Audiovisual Speech Recognition

Nächstes Kapitel Towards Deeper Insights into Deep Learning from Imbalanced Data

http://www.svcl.ucsd.edu/projects/crossmodal/.

http://vision.cs.uiuc.edu/pascal-sentences/.

http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm.

Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: Proceedings of IEEE ICCV, pp. 1–8 (2007)

Castrejn, L., Aytar, Y., Vondrick, C., Pirsiavash, H., Torralba, A.: Learning aligned cross-modal representations from weakly aligned data. In: Proceedings of IEEE CVPR, pp. 2940–2949 (2016)

Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: a real-world web image database from National University of Singapore. In: Proceedings of ACM International Conference on Image and Video Retrieval, pp. 48:1–48:9 (2009)

Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_2 CrossRef

Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: Proceedings of ACM Multimedia, pp. 7–16 (2014)

Kim, J., Nam, J., Gurevych, I.: Learning semantics with deep belief network for cross-language information retrieval. In: Proceedings of IEEE International Conference on Computational Linguistics, pp. 579–588 (2012)

van der Maaten, L., Hinton, G.E.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9(2), 2579–2605 (2008)MATH

Manjunath, B.S., Ohm, J.R., Vasudevan, V.V., Yamada, A.: Color and texture descriptors. IEEE Trans. Circuits Syst. Video Technol. 11(6), 703–715 (2002)CrossRef

Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of IEEE ICML, pp. 689–696 (2011)

10.

Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)CrossRefMATH

11.

Pereira, J.C., Coviello, E., Doyle, G., Rasiwasia, N., Lanckriet, G.R.G., Levy, R., Vasconcelos, N.: On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 521–535 (2013)CrossRef

12.

Quadrianto, N., Lampert, C.H.: Learning multi-view neighborhood preserving projections. In: Proceedings of IEEE ICML, pp. 425–432 (2011)

13.

Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R.G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceedings of IEEE International Conference on Multimedia. pp. 251–260 (2010)

14.

Rosipal, R., Krmer, N.: Overview and recent advances in partial least squares. In: Proceedings of IEEE International Conference on Subspace, Latent Structure and Feature Selection, pp. 34–51 (2005)

15.

Sharma, A., Kumar, A., Daume, H., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: Proceedings of IEEE CVPR, pp. 2160–2167 (2012)

16.

Srivastava, N., Salakhutdinov, R.: Learning representations for multimodal data with deep belief nets. In: Proceedings of IEEE ICML workshop (2012)

17.

Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. J. Mach. Learn. Res. 15(8), 1967–2006 (2012)MathSciNetMATH

18.

Tang, J., Wang, K., Shao, L.: Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans. Image Process. 25(7), 3157–3166 (2016)MathSciNetCrossRef

19.

Wang, C., Yang, H., Meinel, C.: Deep semantic mapping for cross-modal retrieval. In: Proceedings of IEEE International Conference on Tools Artificial Intelligence pp. 234–241 (2016)

20.

Wang, W., Yang, X., Ooi, B.C., Zhang, D., Zhuang, Y.: Effective deep learning-based multi-modal retrieval. VLDB J. 25(1), 79–101 (2016)CrossRef

Titel: Efficient Cross-modal Retrieval via Discriminative Deep Correspondence Model
verfasst von: Zhikai Hu
Xin Liu
An Li
Bineng Zhong
Wentao Fan
Jixiang Du
Verlag: Springer Singapore
Buch: Computer Vision
Print ISBN: 978-981-10-7298-7

Electronic ISBN: 978-981-10-7299-4

Copyright-Jahr: 2017
DOI: https://doi.org/10.1007/978-981-10-7299-4_55

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"