Skip to main content

2016 | OriginalPaper | Buchkapitel

Deep Learning and Shared Representation Space Learning Based Cross-Modal Multimedia Retrieval

verfasst von : Hui Zou, Ji-Xiang Du, Chuan-Min Zhai, Jing Wang

Erschienen in: Intelligent Computing Theories and Application

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

An increasing number of different multimedia information, including text, voice, video and image, are used to describe the same semantic concept together on the Internet. This paper presents a new method to more efficiently cross-modal multimedia retrieval. Using image and text as an example, we learn the deep learning features of images by convolution neural networks, and learn the text features by a latent Dirichlet allocation model. Then map the two features spaces into a shared presentation space by a probability model in order that they are isomorphic. At last, we adopt centered correlation to measure the distance between them. The experimental results in the Wikipedia dataset show that our approach can achieve the state-of-the-art results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Yang, Y., Xu, D., Nie, F., Luo, J., Zhuang, Y.: Ranking with local regression and global alignment for cross media retrieval. In: International Conference on Multimedia, pp. 175–184 (2009) Yang, Y., Xu, D., Nie, F., Luo, J., Zhuang, Y.: Ranking with local regression and global alignment for cross media retrieval. In: International Conference on Multimedia, pp. 175–184 (2009)
2.
Zurück zum Zitat Srivastava, N., Salakhutdinov, R.R.: Multimodal learning with deep Boltzmann machines. In: Neural Information Processing Systems, pp. 2222–2230 (2012) Srivastava, N., Salakhutdinov, R.R.: Multimodal learning with deep Boltzmann machines. In: Neural Information Processing Systems, pp. 2222–2230 (2012)
3.
Zurück zum Zitat Lu, X., Wu, F., Tang, S.: A low rank structural large margin method for cross-modal ranking. In: Research and Development in Information Retrieval, pp. 433–442 (2013) Lu, X., Wu, F., Tang, S.: A low rank structural large margin method for cross-modal ranking. In: Research and Development in Information Retrieval, pp. 433–442 (2013)
4.
Zurück zum Zitat Lu, X., Wu, F., Tang, S., Zhang, Z., He, X., Zhuang, Y.: Cross-media semantic representation via bi-directional learning to rank. In: International Conference on Multimedia, pp. 877–886 (2013) Lu, X., Wu, F., Tang, S., Zhang, Z., He, X., Zhuang, Y.: Cross-media semantic representation via bi-directional learning to rank. In: International Conference on Multimedia, pp. 877–886 (2013)
5.
Zurück zum Zitat Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, pp. 282–289 (2001) Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, pp. 282–289 (2001)
6.
Zurück zum Zitat Xu, X.S., Jiang, Y., Peng, L., Xue, X., Zhou, Z.H.: Ensemble approach based on conditional random field for multi-label image and video annotation. In: International Conference on Multimedia, pp. 1377–1380 (2011) Xu, X.S., Jiang, Y., Peng, L., Xue, X., Zhou, Z.H.: Ensemble approach based on conditional random field for multi-label image and video annotation. In: International Conference on Multimedia, pp. 1377–1380 (2011)
7.
Zurück zum Zitat Zhang, Y., Li, G., Chu, L., Wang, S., Zhang, W., Huang, Q.: Cross-media topic detection: a multi-modality fusion framework. In: International Conference on IEEE, pp. 1–6 (2013) Zhang, Y., Li, G., Chu, L., Wang, S., Zhang, W., Huang, Q.: Cross-media topic detection: a multi-modality fusion framework. In: International Conference on IEEE, pp. 1–6 (2013)
8.
Zurück zum Zitat Li, L., Jiang, S., Huang, Q.: Learning image vicept description via mixed-norm regularization for large scale semantic image search. In: Computer Vision and Pattern Recognition, pp. 825–832 (2011) Li, L., Jiang, S., Huang, Q.: Learning image vicept description via mixed-norm regularization for large scale semantic image search. In: Computer Vision and Pattern Recognition, pp. 825–832 (2011)
9.
Zurück zum Zitat Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: International Conference on Multimedia, pp. 251–260 (2010) Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: International Conference on Multimedia, pp. 251–260 (2010)
10.
Zurück zum Zitat Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefMATH Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefMATH
11.
Zurück zum Zitat LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)CrossRef LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)CrossRef
12.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, pp. 1097–1105 (2012)
13.
Zurück zum Zitat Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)CrossRef Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)CrossRef
14.
Zurück zum Zitat Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Computer Vision and Pattern Recognition Workshops, pp. 512–519 (2014) Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Computer Vision and Pattern Recognition Workshops, pp. 512–519 (2014)
15.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
16.
Zurück zum Zitat Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Conference on Uncertainty in Artificial Intelligence, pp. 487–494 (2004) Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Conference on Uncertainty in Artificial Intelligence, pp. 487–494 (2004)
17.
Zurück zum Zitat Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Conference on Empirical Methods in Natural Language Processing, pp. 248–256 (2009) Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Conference on Empirical Methods in Natural Language Processing, pp. 248–256 (2009)
18.
Zurück zum Zitat Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link LDA: joint models of topic and author community. In: Annual International Conference on Machine Learning, pp. 665–672 (2009) Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link LDA: joint models of topic and author community. In: Annual International Conference on Machine Learning, pp. 665–672 (2009)
19.
Zurück zum Zitat Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: International Conference on Machine Learning, pp. 689–696 (2011) Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: International Conference on Machine Learning, pp. 689–696 (2011)
20.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: International Conference on Multimedia, pp. 675–678 (2014) Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: International Conference on Multimedia, pp. 675–678 (2014)
21.
Zurück zum Zitat Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: International Conference on Machine Learning, pp. 807–814 (2010) Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: International Conference on Machine Learning, pp. 807–814 (2010)
22.
Zurück zum Zitat Li, J., Luo, W., Yang, J., Yuan, X.: Why Does The Unsupervised Pretraining Encourages Moderate-Sparseness. arXiv Preprint arXiv:1312.5813 (2013) Li, J., Luo, W., Yang, J., Yuan, X.: Why Does The Unsupervised Pretraining Encourages Moderate-Sparseness. arXiv Preprint arXiv:​1312.​5813 (2013)
23.
Zurück zum Zitat Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving Neural Networks by Preventing Co-adaptation of Feature Detectors. arXiv Preprint arXiv:1207.0580 (2012) Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving Neural Networks by Preventing Co-adaptation of Feature Detectors. arXiv Preprint arXiv:​1207.​0580 (2012)
24.
Zurück zum Zitat Wang, W., Ooi, B.C., Yang, X., Zhang, D., Zhuang, Y.: Effective multi-modal retrieval based on stacked auto-encoders. Proc. VLDB Endowment 7(8), 649–660 (2014)CrossRef Wang, W., Ooi, B.C., Yang, X., Zhang, D., Zhuang, Y.: Effective multi-modal retrieval based on stacked auto-encoders. Proc. VLDB Endowment 7(8), 649–660 (2014)CrossRef
25.
Zurück zum Zitat Wu, F., Jiang, X., Li, X., Tang, S., Lu, W., Zhang, Z., Zhuang, Y.: Cross-modal learning to rank via latent joint representation. Image Process. 24(5), 1497–1509 (2015)MathSciNetCrossRef Wu, F., Jiang, X., Li, X., Tang, S., Lu, W., Zhang, Z., Zhuang, Y.: Cross-modal learning to rank via latent joint representation. Image Process. 24(5), 1497–1509 (2015)MathSciNetCrossRef
26.
Zurück zum Zitat Ling, L., Zhai, X., Peng, Y.: Tri-space and ranking based heterogeneous similarity measure for cross-media retrieval. In: Pattern Recognition International Conference on IEEE, pp. 230–233 (2012) Ling, L., Zhai, X., Peng, Y.: Tri-space and ranking based heterogeneous similarity measure for cross-media retrieval. In: Pattern Recognition International Conference on IEEE, pp. 230–233 (2012)
Metadaten
Titel
Deep Learning and Shared Representation Space Learning Based Cross-Modal Multimedia Retrieval
verfasst von
Hui Zou
Ji-Xiang Du
Chuan-Min Zhai
Jing Wang
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-42294-7_28

Premium Partner