nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

Deep Learning and Shared Representation Space Learning Based Cross-Modal Multimedia Retrieval

verfasst von : Hui Zou, Ji-Xiang Du, Chuan-Min Zhai, Jing Wang

Erschienen in: Intelligent Computing Theories and Application

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

An increasing number of different multimedia information, including text, voice, video and image, are used to describe the same semantic concept together on the Internet. This paper presents a new method to more efficiently cross-modal multimedia retrieval. Using image and text as an example, we learn the deep learning features of images by convolution neural networks, and learn the text features by a latent Dirichlet allocation model. Then map the two features spaces into a shared presentation space by a probability model in order that they are isomorphic. At last, we adopt centered correlation to measure the distance between them. The experimental results in the Wikipedia dataset show that our approach can achieve the state-of-the-art results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Natural Scene Digit Classification Using Convolutional Neural Networks

Nächstes Kapitel Leaf Classification Utilizing a Convolutional Neural Network with a Structure of Single Connected Layer

Yang, Y., Xu, D., Nie, F., Luo, J., Zhuang, Y.: Ranking with local regression and global alignment for cross media retrieval. In: International Conference on Multimedia, pp. 175–184 (2009)

Srivastava, N., Salakhutdinov, R.R.: Multimodal learning with deep Boltzmann machines. In: Neural Information Processing Systems, pp. 2222–2230 (2012)

Lu, X., Wu, F., Tang, S.: A low rank structural large margin method for cross-modal ranking. In: Research and Development in Information Retrieval, pp. 433–442 (2013)

Lu, X., Wu, F., Tang, S., Zhang, Z., He, X., Zhuang, Y.: Cross-media semantic representation via bi-directional learning to rank. In: International Conference on Multimedia, pp. 877–886 (2013)

Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, pp. 282–289 (2001)

Xu, X.S., Jiang, Y., Peng, L., Xue, X., Zhou, Z.H.: Ensemble approach based on conditional random field for multi-label image and video annotation. In: International Conference on Multimedia, pp. 1377–1380 (2011)

Zhang, Y., Li, G., Chu, L., Wang, S., Zhang, W., Huang, Q.: Cross-media topic detection: a multi-modality fusion framework. In: International Conference on IEEE, pp. 1–6 (2013)

Li, L., Jiang, S., Huang, Q.: Learning image vicept description via mixed-norm regularization for large scale semantic image search. In: Computer Vision and Pattern Recognition, pp. 825–832 (2011)

Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: International Conference on Multimedia, pp. 251–260 (2010)

10.

Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefMATH

11.

LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)CrossRef

12.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, pp. 1097–1105 (2012)

13.

Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)CrossRef

14.

Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Computer Vision and Pattern Recognition Workshops, pp. 512–519 (2014)

15.

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH

16.

Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Conference on Uncertainty in Artificial Intelligence, pp. 487–494 (2004)

17.

Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Conference on Empirical Methods in Natural Language Processing, pp. 248–256 (2009)

18.

Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link LDA: joint models of topic and author community. In: Annual International Conference on Machine Learning, pp. 665–672 (2009)

19.

Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: International Conference on Machine Learning, pp. 689–696 (2011)

20.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: International Conference on Multimedia, pp. 675–678 (2014)

21.

Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: International Conference on Machine Learning, pp. 807–814 (2010)

22.

Li, J., Luo, W., Yang, J., Yuan, X.: Why Does The Unsupervised Pretraining Encourages Moderate-Sparseness. arXiv Preprint arXiv:1312.5813 (2013)

23.

Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving Neural Networks by Preventing Co-adaptation of Feature Detectors. arXiv Preprint arXiv:1207.0580 (2012)

24.

Wang, W., Ooi, B.C., Yang, X., Zhang, D., Zhuang, Y.: Effective multi-modal retrieval based on stacked auto-encoders. Proc. VLDB Endowment 7(8), 649–660 (2014)CrossRef

25.

Wu, F., Jiang, X., Li, X., Tang, S., Lu, W., Zhang, Z., Zhuang, Y.: Cross-modal learning to rank via latent joint representation. Image Process. 24(5), 1497–1509 (2015)MathSciNetCrossRef

26.

Ling, L., Zhai, X., Peng, Y.: Tri-space and ranking based heterogeneous similarity measure for cross-media retrieval. In: Pattern Recognition International Conference on IEEE, pp. 230–233 (2012)

Titel: Deep Learning and Shared Representation Space Learning Based Cross-Modal Multimedia Retrieval
verfasst von: Hui Zou
Ji-Xiang Du
Chuan-Min Zhai
Jing Wang
Verlag: Springer International Publishing
Buch: Intelligent Computing Theories and Application
Print ISBN: 978-3-319-42293-0

Electronic ISBN: 978-3-319-42294-7

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-42294-7_28

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner