Skip to main content

2017 | OriginalPaper | Buchkapitel

Efficient Cross-modal Retrieval via Discriminative Deep Correspondence Model

verfasst von : Zhikai Hu, Xin Liu, An Li, Bineng Zhong, Wentao Fan, Jixiang Du

Erschienen in: Computer Vision

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Cross-modal retrieval has recently drawn much attention due to the widespread existence of multi-modal data, and it generally involves two challenges: how to model the correlations and how to utilize the class label information to eliminate the heterogeneity between different modalities. Most previous works mainly focus on solving the first challenge and often ignore the second one. In this paper, we propose a discriminative deep correspondence model to deal with both problems. By taking the class label information into consideration, our proposed model attempts to seamlessly combine the correspondence autoencoder (Corr-AE) and supervised correspondence neural networks (Super-Corr-NN) for cross-modal matching. The former model can learn the correspondence representations of data from different modalities, while the latter model is designed to discriminatively reduce the semantic gap between the low-level features and high-level descriptions. The extensive experiments tested on three public datasets demonstrate the effectiveness of the proposed approach in comparison with the state-of-the-art competing methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: Proceedings of IEEE ICCV, pp. 1–8 (2007) Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: Proceedings of IEEE ICCV, pp. 1–8 (2007)
2.
Zurück zum Zitat Castrejn, L., Aytar, Y., Vondrick, C., Pirsiavash, H., Torralba, A.: Learning aligned cross-modal representations from weakly aligned data. In: Proceedings of IEEE CVPR, pp. 2940–2949 (2016) Castrejn, L., Aytar, Y., Vondrick, C., Pirsiavash, H., Torralba, A.: Learning aligned cross-modal representations from weakly aligned data. In: Proceedings of IEEE CVPR, pp. 2940–2949 (2016)
3.
Zurück zum Zitat Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: a real-world web image database from National University of Singapore. In: Proceedings of ACM International Conference on Image and Video Retrieval, pp. 48:1–48:9 (2009) Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: a real-world web image database from National University of Singapore. In: Proceedings of ACM International Conference on Image and Video Retrieval, pp. 48:1–48:9 (2009)
4.
Zurück zum Zitat Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_2 CrossRef Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). https://​doi.​org/​10.​1007/​978-3-642-15561-1_​2 CrossRef
5.
Zurück zum Zitat Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: Proceedings of ACM Multimedia, pp. 7–16 (2014) Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: Proceedings of ACM Multimedia, pp. 7–16 (2014)
6.
Zurück zum Zitat Kim, J., Nam, J., Gurevych, I.: Learning semantics with deep belief network for cross-language information retrieval. In: Proceedings of IEEE International Conference on Computational Linguistics, pp. 579–588 (2012) Kim, J., Nam, J., Gurevych, I.: Learning semantics with deep belief network for cross-language information retrieval. In: Proceedings of IEEE International Conference on Computational Linguistics, pp. 579–588 (2012)
7.
Zurück zum Zitat van der Maaten, L., Hinton, G.E.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9(2), 2579–2605 (2008)MATH van der Maaten, L., Hinton, G.E.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9(2), 2579–2605 (2008)MATH
8.
Zurück zum Zitat Manjunath, B.S., Ohm, J.R., Vasudevan, V.V., Yamada, A.: Color and texture descriptors. IEEE Trans. Circuits Syst. Video Technol. 11(6), 703–715 (2002)CrossRef Manjunath, B.S., Ohm, J.R., Vasudevan, V.V., Yamada, A.: Color and texture descriptors. IEEE Trans. Circuits Syst. Video Technol. 11(6), 703–715 (2002)CrossRef
9.
Zurück zum Zitat Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of IEEE ICML, pp. 689–696 (2011) Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of IEEE ICML, pp. 689–696 (2011)
10.
Zurück zum Zitat Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)CrossRefMATH Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)CrossRefMATH
11.
Zurück zum Zitat Pereira, J.C., Coviello, E., Doyle, G., Rasiwasia, N., Lanckriet, G.R.G., Levy, R., Vasconcelos, N.: On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 521–535 (2013)CrossRef Pereira, J.C., Coviello, E., Doyle, G., Rasiwasia, N., Lanckriet, G.R.G., Levy, R., Vasconcelos, N.: On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 521–535 (2013)CrossRef
12.
Zurück zum Zitat Quadrianto, N., Lampert, C.H.: Learning multi-view neighborhood preserving projections. In: Proceedings of IEEE ICML, pp. 425–432 (2011) Quadrianto, N., Lampert, C.H.: Learning multi-view neighborhood preserving projections. In: Proceedings of IEEE ICML, pp. 425–432 (2011)
13.
Zurück zum Zitat Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R.G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceedings of IEEE International Conference on Multimedia. pp. 251–260 (2010) Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R.G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceedings of IEEE International Conference on Multimedia. pp. 251–260 (2010)
14.
Zurück zum Zitat Rosipal, R., Krmer, N.: Overview and recent advances in partial least squares. In: Proceedings of IEEE International Conference on Subspace, Latent Structure and Feature Selection, pp. 34–51 (2005) Rosipal, R., Krmer, N.: Overview and recent advances in partial least squares. In: Proceedings of IEEE International Conference on Subspace, Latent Structure and Feature Selection, pp. 34–51 (2005)
15.
Zurück zum Zitat Sharma, A., Kumar, A., Daume, H., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: Proceedings of IEEE CVPR, pp. 2160–2167 (2012) Sharma, A., Kumar, A., Daume, H., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: Proceedings of IEEE CVPR, pp. 2160–2167 (2012)
16.
Zurück zum Zitat Srivastava, N., Salakhutdinov, R.: Learning representations for multimodal data with deep belief nets. In: Proceedings of IEEE ICML workshop (2012) Srivastava, N., Salakhutdinov, R.: Learning representations for multimodal data with deep belief nets. In: Proceedings of IEEE ICML workshop (2012)
17.
Zurück zum Zitat Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. J. Mach. Learn. Res. 15(8), 1967–2006 (2012)MathSciNetMATH Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. J. Mach. Learn. Res. 15(8), 1967–2006 (2012)MathSciNetMATH
18.
Zurück zum Zitat Tang, J., Wang, K., Shao, L.: Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans. Image Process. 25(7), 3157–3166 (2016)MathSciNetCrossRef Tang, J., Wang, K., Shao, L.: Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans. Image Process. 25(7), 3157–3166 (2016)MathSciNetCrossRef
19.
Zurück zum Zitat Wang, C., Yang, H., Meinel, C.: Deep semantic mapping for cross-modal retrieval. In: Proceedings of IEEE International Conference on Tools Artificial Intelligence pp. 234–241 (2016) Wang, C., Yang, H., Meinel, C.: Deep semantic mapping for cross-modal retrieval. In: Proceedings of IEEE International Conference on Tools Artificial Intelligence pp. 234–241 (2016)
20.
Zurück zum Zitat Wang, W., Yang, X., Ooi, B.C., Zhang, D., Zhuang, Y.: Effective deep learning-based multi-modal retrieval. VLDB J. 25(1), 79–101 (2016)CrossRef Wang, W., Yang, X., Ooi, B.C., Zhang, D., Zhuang, Y.: Effective deep learning-based multi-modal retrieval. VLDB J. 25(1), 79–101 (2016)CrossRef
Metadaten
Titel
Efficient Cross-modal Retrieval via Discriminative Deep Correspondence Model
verfasst von
Zhikai Hu
Xin Liu
An Li
Bineng Zhong
Wentao Fan
Jixiang Du
Copyright-Jahr
2017
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-7299-4_55