Skip to main content
Top
Published in: International Journal of Multimedia Information Retrieval 1/2019

01-01-2019 | Regular Paper

Multi-view collective tensor decomposition for cross-modal hashing

Authors: Limeng Cui, Jiawei Zhang, Lifang He, Philip S. Yu

Published in: International Journal of Multimedia Information Retrieval | Issue 1/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the development of social media, data often come from a variety of sources in different modalities. These data contain complementary information that can be used to produce better learning algorithms. Such data exhibit dual heterogeneity: On the one hand, data obtained from multiple modalities are intrinsically different; on the other hand, features obtained from different disciplines are usually heterogeneous. Existing methods often consider the first facet while ignoring the second. Thus, in this paper, we propose a novel multi-view cross-modal hashing method named Multi-view Collective Tensor Decomposition (MCTD) to mitigate the dual heterogeneity at the same time, which can fully exploit the multimodal multi-view feature while simultaneously discovering multiple separated subspaces by leveraging the data categories as supervision information. We propose a novel cross-modal retrieval framework which consists of three components: (1) two tensors which model the multi-view features from different modalities in order to get better representation of the complementary features and a latent representation space; (2) a block-diagonal loss which is used to explicitly enforce a more discriminative latent space by leveraging supervision information; and (3) two feature projection matrices which characterize the data and generate the latent representation for incoming new queries. We use an iterative updating optimization algorithm to solve the objective function designed for MCTD. Extensive experiments prove the effectiveness of MCTD compared with state-of-the-art methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Antipov G, Berrani SA, Ruchaud N, Dugelay JL (2015) Learned versus hand-crafted features for pedestrian gender recognition. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM, pp 1263–1266 Antipov G, Berrani SA, Ruchaud N, Dugelay JL (2015) Learned versus hand-crafted features for pedestrian gender recognition. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM, pp 1263–1266
2.
go back to reference Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on. IEEE, pp 3594–3601 Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on. IEEE, pp 3594–3601
3.
go back to reference Cao B, Zhou H, Li G, Yu PS (2016) Multi-view machines. In: Proceedings of the ninth ACM international conference on web search and data mining. ACM, pp 427–436 Cao B, Zhou H, Li G, Yu PS (2016) Multi-view machines. In: Proceedings of the ninth ACM international conference on web search and data mining. ACM, pp 427–436
4.
go back to reference Cao Y, Long M, Wang J, Liu S (2017) Collective deep quantization for efficient cross-modal retrieval. In: AAAI, pp 3974–3980 Cao Y, Long M, Wang J, Liu S (2017) Collective deep quantization for efficient cross-modal retrieval. In: AAAI, pp 3974–3980
5.
go back to reference Cao Y, Long M, Wang J, Yang Q, Yu PS (2016) Deep visual-semantic hashing for cross-modal retrieval. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1445–1454 Cao Y, Long M, Wang J, Yang Q, Yu PS (2016) Deep visual-semantic hashing for cross-modal retrieval. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1445–1454
6.
go back to reference Ding C, Tao D (2015) Robust face recognition via multimodal deep face representation. IEEE Trans Multimedia 17(11):2049–2058CrossRef Ding C, Tao D (2015) Robust face recognition via multimodal deep face representation. IEEE Trans Multimedia 17(11):2049–2058CrossRef
7.
go back to reference Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082 Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082
8.
go back to reference Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233CrossRef Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233CrossRef
9.
go back to reference Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929CrossRef Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929CrossRef
10.
go back to reference Huang X, Peng Y, Yuan M (2017) Cross-modal common representation learning by hybrid transfer network. In: Proceedings of the 26th international joint conference on artificial intelligence. AAAI Press, pp 1893–1900 Huang X, Peng Y, Yuan M (2017) Cross-modal common representation learning by hybrid transfer network. In: Proceedings of the 26th international joint conference on artificial intelligence. AAAI Press, pp 1893–1900
11.
go back to reference Hwang SJ, Grauman K (2012) Reading between the lines: object localization using implicit cues from image tags. IEEE Trans Pattern Anal Mach Intell 34(6):1145–1158CrossRef Hwang SJ, Grauman K (2012) Reading between the lines: object localization using implicit cues from image tags. IEEE Trans Pattern Anal Mach Intell 34(6):1145–1158CrossRef
12.
go back to reference Jiang QY, Li WJ (2017) Deep cross-modal hashing. In: Computer vision and pattern recognition (CVPR), 2017 IEEE conference on. IEEE, pp 3270–3278 Jiang QY, Li WJ (2017) Deep cross-modal hashing. In: Computer vision and pattern recognition (CVPR), 2017 IEEE conference on. IEEE, pp 3270–3278
13.
go back to reference Jin L, Gao S, Li Z, Tang J (2014) Hand-crafted features or machine learnt features? Together they improve rgb-d object recognition. In: Multimedia (ISM), 2014 IEEE international symposium on. IEEE, pp 311–319 Jin L, Gao S, Li Z, Tang J (2014) Hand-crafted features or machine learnt features? Together they improve rgb-d object recognition. In: Multimedia (ISM), 2014 IEEE international symposium on. IEEE, pp 311–319
15.
go back to reference Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
16.
go back to reference Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: IJCAI proceedings-international joint conference on artificial intelligence, vol 22, p 1360 Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: IJCAI proceedings-international joint conference on artificial intelligence, vol 22, p 1360
17.
go back to reference Li K, Qi GJ, Ye J, Hua KA (2017) Linear subspace ranking hashing for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 39(9):1825–1838CrossRef Li K, Qi GJ, Ye J, Hua KA (2017) Linear subspace ranking hashing for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 39(9):1825–1838CrossRef
18.
go back to reference Lin Z, Ding G, Han J, Wang J (2017) Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans Cybernet 47(12):4342–4355CrossRef Lin Z, Ding G, Han J, Wang J (2017) Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans Cybernet 47(12):4342–4355CrossRef
19.
go back to reference Liu H, Ji R, Wu Y, Hua G (2016) Supervised matrix factorization for cross-modality hashing. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence. AAAI Press, pp 1767–1773 Liu H, Ji R, Wu Y, Hua G (2016) Supervised matrix factorization for cross-modality hashing. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence. AAAI Press, pp 1767–1773
20.
go back to reference Lu X, Wu F, Tang S, Zhang Z, He X, Zhuang Y (2013) A low rank structural large margin method for cross-modal ranking. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 433–442 Lu X, Wu F, Tang S, Zhang Z, He X, Zhuang Y (2013) A low rank structural large margin method for cross-modal ranking. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 433–442
21.
go back to reference Moran S, Lavrenko V (2015) Regularised cross-modal hashing. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 907–910 Moran S, Lavrenko V (2015) Regularised cross-modal hashing. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 907–910
22.
go back to reference Mørup M, Hansen LK, Arnfred SM (2008) Algorithms for sparse nonnegative Tucker decompositions. Neural Computation 20(8):2112–2131CrossRefMATH Mørup M, Hansen LK, Arnfred SM (2008) Algorithms for sparse nonnegative Tucker decompositions. Neural Computation 20(8):2112–2131CrossRefMATH
23.
go back to reference Peng Y, Huang X, Qi J (2016) Cross-media shared representation by hierarchical learning with multiple deep networks. In: IJCAI, pp 3846–3853 Peng Y, Huang X, Qi J (2016) Cross-media shared representation by hierarchical learning with multiple deep networks. In: IJCAI, pp 3846–3853
24.
go back to reference Peng Y, Qi J, Huang X, Yuan Y (2018) Ccl: cross-modal correlation learning with multigrained fusion by hierarchical network. IEEE Trans Multimedia 20(2):405–420CrossRef Peng Y, Qi J, Huang X, Yuan Y (2018) Ccl: cross-modal correlation learning with multigrained fusion by hierarchical network. IEEE Trans Multimedia 20(2):405–420CrossRef
25.
go back to reference Qi J, Peng Y (2018) Cross-modal bidirectional translation via reinforcement learning. In: IJCAI, pp 2630–2636 Qi J, Peng Y (2018) Cross-modal bidirectional translation via reinforcement learning. In: IJCAI, pp 2630–2636
26.
go back to reference Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on Multimedia. ACM, pp 251–260 Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on Multimedia. ACM, pp 251–260
27.
go back to reference Rendle S (2010) Factorization machines. In: Data mining (ICDM), 2010 IEEE 10th international conference on. IEEE, pp 995–1000 Rendle S (2010) Factorization machines. In: Data mining (ICDM), 2010 IEEE 10th international conference on. IEEE, pp 995–1000
28.
go back to reference Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on. IEEE, pp 2160–2167 Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on. IEEE, pp 2160–2167
29.
go back to reference Shen X, Shen F, Sun QS, Yang Y, Yuan YH, Shen HT (2017) Semi-paired discrete hashing: learning latent hash codes for semi-paired cross-view retrieval. IEEE Trans Cybern 47(12):4275–4288CrossRef Shen X, Shen F, Sun QS, Yang Y, Yuan YH, Shen HT (2017) Semi-paired discrete hashing: learning latent hash codes for semi-paired cross-view retrieval. IEEE Trans Cybern 47(12):4275–4288CrossRef
30.
go back to reference Shen X, Shen F, Sun QS, Yuan YH (2015) Multi-view latent hashing for efficient multimedia search. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM, pp 831–834 Shen X, Shen F, Sun QS, Yuan YH (2015) Multi-view latent hashing for efficient multimedia search. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM, pp 831–834
31.
go back to reference Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data. ACM, pp 785–796 Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data. ACM, pp 785–796
32.
go back to reference Tang J, Jin L, Li Z, Gao S (2015) Rgb-d object recognition via incorporating latent data structure and prior knowledge. IEEE Trans Multimedia 17(11):1899–1908CrossRef Tang J, Jin L, Li Z, Gao S (2015) Rgb-d object recognition via incorporating latent data structure and prior knowledge. IEEE Trans Multimedia 17(11):1899–1908CrossRef
33.
go back to reference Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166MathSciNetCrossRef Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166MathSciNetCrossRef
35.
36.
go back to reference Wei Y, Zhao Y, Lu C, Wei S, Liu L, Zhu Z, Yan S (2017) Cross-modal retrieval with cnn visual features: a new baseline. IEEE Trans Cybern 47(2):449–460 Wei Y, Zhao Y, Lu C, Wei S, Liu L, Zhu Z, Yan S (2017) Cross-modal retrieval with cnn visual features: a new baseline. IEEE Trans Cybern 47(2):449–460
37.
go back to reference Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process 26(5):2494–2507MathSciNetCrossRef Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process 26(5):2494–2507MathSciNetCrossRef
38.
go back to reference Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 175–184 Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 175–184
39.
go back to reference Yao T, Kong X, Fu H, Tian Q (2016) Semantic consistency hashing for cross-modal retrieval. Neurocomputing 193:250–259CrossRef Yao T, Kong X, Fu H, Tian Q (2016) Semantic consistency hashing for cross-modal retrieval. Neurocomputing 193:250–259CrossRef
40.
go back to reference Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. AAAI 1:7 Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. AAAI 1:7
41.
go back to reference Zhang J, Peng Y (2017) Ssdh: semi-supervised deep hashing for large scale image retrieval. IEEE Trans Circuits Syst Video Technol Zhang J, Peng Y (2017) Ssdh: semi-supervised deep hashing for large scale image retrieval. IEEE Trans Circuits Syst Video Technol
42.
go back to reference Zhang J, Peng Y (2018) Query-adaptive image retrieval by deep weighted hashing. IEEE Trans Multimedia Zhang J, Peng Y (2018) Query-adaptive image retrieval by deep weighted hashing. IEEE Trans Multimedia
43.
go back to reference Zhang J, Peng Y, Yuan M (2018) Unsupervised generative adversarial cross-modal hashing Zhang J, Peng Y, Yuan M (2018) Unsupervised generative adversarial cross-modal hashing
44.
go back to reference Zhen Y, Yeung DY (2012) Co-regularized hashing for multimodal data. In: Advances in neural information processing systems, pp 1376–1384 Zhen Y, Yeung DY (2012) Co-regularized hashing for multimodal data. In: Advances in neural information processing systems, pp 1376–1384
45.
go back to reference Zhen Y, Yeung DY (2012) A probabilistic model for multimodal hash function learning. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 940–948 Zhen Y, Yeung DY (2012) A probabilistic model for multimodal hash function learning. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 940–948
46.
go back to reference Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval. ACM, pp 415–424 Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval. ACM, pp 415–424
47.
go back to reference Zhu X, Huang Z, Shen HT, Zhao X (2013) Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 143–152 Zhu X, Huang Z, Shen HT, Zhao X (2013) Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 143–152
Metadata
Title
Multi-view collective tensor decomposition for cross-modal hashing
Authors
Limeng Cui
Jiawei Zhang
Lifang He
Philip S. Yu
Publication date
01-01-2019
Publisher
Springer London
Published in
International Journal of Multimedia Information Retrieval / Issue 1/2019
Print ISSN: 2192-6611
Electronic ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-018-0164-0

Other articles of this Issue 1/2019

International Journal of Multimedia Information Retrieval 1/2019 Go to the issue

Premium Partner