research-article

Effective multi-modal retrieval based on stacked auto-encoders

Authors:
Wei Wang

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Beng Chin Ooi

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Xiaoyan Yang

Illinois at Singapore Pte, Singapore

Illinois at Singapore Pte, Singapore
View Profile

,
Dongxiang Zhang

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Yueting Zhuang

Zhejiang University, China

Zhejiang University, China
View Profile

Proceedings of the VLDB Endowment Volume 7 Issue 8pp 649–660https://doi.org/10.14778/2732296.2732301

Published:01 April 2014Publication History

Proceedings of the VLDB Endowment

Abstract

Multi-modal retrieval is emerging as a new search paradigm that enables seamless information retrieval from various types of media. For example, users can simply snap a movie poster to search relevant reviews and trailers. To solve the problem, a set of mapping functions are learned to project high-dimensional features extracted from data of different media types into a common low-dimensional space so that metric distance measures can be applied. In this paper, we propose an effective mapping mechanism based on deep learning (i.e., stacked auto-encoders) for multi-modal retrieval. Mapping functions are learned by optimizing a new objective function, which captures both intra-modal and inter-modal semantic relationships of data from heterogeneous sources effectively. Compared with previous works which require a substantial amount of prior knowledge such as similarity matrices of intra-modal data and ranking examples, our method requires little prior knowledge. Given a large training dataset, we split it into mini-batches and continually adjust the mapping functions for each batch of input. Hence, our method is memory efficient with respect to the data volume. Experiments on three real datasets illustrate that our proposed method achieves significant improvement in search accuracy over the state-of-the-art methods.

References

M. M. Bronstein, A. M. Bronstein, F. Michel, and N. Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In CVPR, pages 3594--3601, 2010.Google ScholarCross Ref
T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y.-T. Zheng. Nus-wide: A real-world web image database from national university of singapore. In Proc. of ACM Conf. on Image and Video Retrieval (CIVR'09), Santorini, Greece., July 8-10, 2009. Google ScholarDigital Library
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. W. Senior, P. A. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In NIPS, pages 1232--1240, 2012.Google ScholarDigital Library
R. Goroshin and Y. LeCun. Saturating auto-encoder. CoRR, abs/1301.3577, 2013.Google Scholar
G. Hinton. A Practical Guide to Training Restricted Boltzmann Machines. Technical report, 2010.Google Scholar
G. R. Hjaltason and H. Samet. Index-driven similarity search in metric spaces. ACM Trans. Database Syst., 28(4):517--580, 2003. Google ScholarDigital Library
M. J. Huiskes and M. S. Lew. The mir flickr retrieval evaluation. In Multimedia Information Retrieval, pages 39--43, 2008. Google ScholarDigital Library
A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.Google Scholar
S. Kumar and R. Udupa. Learning hash functions for cross-view similarity search. In IJCAI, pages 1360--1365, 2011. Google ScholarDigital Library
Y. LeCun, L. Bottou, G. Orr, and K. Müller. Efficient BackProp. In G. Orr and K.-R. Müller, editors, Neural Networks: Tricks of the Trade, volume 1524 of Lecture Notes in Computer Science, chapter 2, pages 9--50. Springer Berlin Heidelberg, Berlin, Heidelberg, Mar. 1998. Google ScholarDigital Library
W. Liu, J. Wang, S. Kumar, and S.-F. Chang. Hashing with graphs. In ICML, pages 1--8, 2011.Google ScholarDigital Library
X. Lu, F. Wu, S. Tang, Z. Zhang, X. He, and Y. Zhuang. A low rank structural large margin method for cross-modal ranking. In SIGIR, pages 433--442, 2013. Google ScholarDigital Library
A. L. Maas, Q. V. Le, T. M. O'Neil, O. Vinyals, P. Nguyen, and A. Y. Ng. Recurrent neural networks for noise reduction in robust asr. In INTERSPEECH, 2012.Google ScholarCross Ref
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval, pages 151--175. Cambridge University Press, 2008. Google ScholarDigital Library
J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodal deep learning. In ICML, pages 689--696, 2011.Google ScholarDigital Library
N. Rasiwasia, J. C. Pereira, E. Coviello, G. Doyle, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM Multimedia, pages 251--260, 2010. Google ScholarDigital Library
S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. Contractive auto-encoders: Explicit invariance during feature extraction. In ICML, pages 833--840, 2011.Google ScholarDigital Library
R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int. J. Approx. Reasoning, 50(7):969--978, 2009. Google ScholarDigital Library
R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning. Semi-supervised recursive autoencoders for predicting sentiment distributions. In EMNLP, pages 151--161, 2011. Google ScholarDigital Library
J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In SIGMOD Conference, pages 785--796, 2013. Google ScholarDigital Library
N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In NIPS, pages 2231--2239, 2012.Google Scholar
P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol. Extracting and composing robust features with denoising autoencoders. In ICML, pages 1096--1103, 2008. Google ScholarDigital Library
R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, pages 194--205, 1998. Google ScholarDigital Library
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, pages 1753--1760, 2008.Google ScholarDigital Library
Y. Zhen and D.-Y. Yeung. A probabilistic model for multimodal hash function learning. In KDD, pages 940--948, 2012. Google ScholarDigital Library
X. Zhu, Z. Huang, H. T. Shen, and X. Zhao. Linear cross-modal hashing for efficient multimodal search. MM, 2013. Google ScholarDigital Library
Y. Zhuang, Y. Yang, and F. Wu. Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Transactions on Multimedia, 10(2):221--229, 2008. Google ScholarDigital Library

Index Terms

Effective multi-modal retrieval based on stacked auto-encoders
1. Information systems
  1. Information retrieval

Index terms have been assigned to the content through auto-classification.

Recommendations

Lossless-constraint Denoising based Auto-encoders

In this paper, we address the poor generalization ability problem of traditional auto-encoder on noise data, and propose a Lossless-constraint Denoising (LD) method, which can enhance the anti-noise ability and robustness of auto-encoders. We ...
Read More
Effective deep learning-based multi-modal retrieval

Multi-modal retrieval is emerging as a new search paradigm that enables seamless information retrieval from various types of media. For example, users can simply snap a movie poster to search for relevant reviews and trailers. The mainstream solution to ...
Read More
Re-ranking by multi-modal relevance feedback for content-based social image retrieval
APWeb'12: Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications

With the recent rapid growth of social image hosting websites, it is becoming increasingly easy to construct a large database of tagged images. In this paper, we investigate whether and how social tags can be used for improving content-based image ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 7, Issue 8
April 2014
60 pages
ISSN:2150-8097
Editors:
H. V. Jagadish
University of Michigan
,
Aoying Zhou
East Normal University, China
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 April 2014
Published in pvldb Volume 7, Issue 8
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 52
  Total Citations
  View Citations
- 502
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Effective multi-modal retrieval based on stacked auto-encoders

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Lossless-constraint Denoising based Auto-encoders

Effective deep learning-based multi-modal retrieval

Re-ranking by multi-modal relevance feedback for content-based social image retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Effective multi-modal retrieval based on stacked auto-encoders

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Lossless-constraint Denoising based Auto-encoders

Effective deep learning-based multi-modal retrieval

Re-ranking by multi-modal relevance feedback for content-based social image retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media