Skip to main content

2018 | OriginalPaper | Buchkapitel

Domain Invariant Subspace Learning for Cross-Modal Retrieval

verfasst von : Chenlu Liu, Xing Xu, Yang Yang, Huimin Lu, Fumin Shen, Yanli Ji

Erschienen in: MultiMedia Modeling

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Due to the rapid growth of multimodal data, cross-modal retrieval has drawn growing attention in recent years, which aims to take one type of data as the query to retrieve relevant data of another type. To enable directly matching between different modalities, the key issue in cross-modal retrieval is to eliminate the heterogeneity between modalities. A bundle of existing approaches directly project the samples of multimodal data into a common latent subspace with the supervision of class label information, and different samples within the same class contribute uniformly to the subspace construction. However, the subspace constructed by these methods may not reveal the true importance of each sample as well as the discrimination of different class label. To tackle this problem, in this paper we regard different modalities as different domains and propose a Domain Invariant Subspace Learning (DISL) method to associate multimodal data. Specifically, DISL simultaneously minimize the classification error with sample-wise weighting coefficients and preserve the structure similarity within and across modalities with the graph regularization. Therefore, the subspace learned by DISL can well reflect the sample-wise importance and capture the discrimination of different class labels in multi-modal data. Compared with several state-of-the-art algorithms, extensive experiments on three public datasets demonstrate the superiority of the proposed method for cross-modal retrieval tasks such as image-to-text and text-to-image.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Shen, X., Shen, F., Sun, Q., Yang, Y., Yuan, Y., Shen, H.T.: Semi-paired discrete hashing: learning latent hash codes for semi-paired cross-view retrieval. In: TCYB (2016) Shen, X., Shen, F., Sun, Q., Yang, Y., Yuan, Y., Shen, H.T.: Semi-paired discrete hashing: learning latent hash codes for semi-paired cross-view retrieval. In: TCYB (2016)
2.
Zurück zum Zitat He, L., Xu, X., Lu, H., Yang, Y., Shen, F., Shen, H.T.: Unsupervised cross modal retrieval through adversarial learning. In: ICML (2017) He, L., Xu, X., Lu, H., Yang, Y., Shen, F., Shen, H.T.: Unsupervised cross modal retrieval through adversarial learning. In: ICML (2017)
3.
Zurück zum Zitat Zhang, H., Zha, Z., Yang, Y., Yan, S., Gao, Y., Chua, T.: Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: ACM MM, pp. 33–42 (2013) Zhang, H., Zha, Z., Yang, Y., Yan, S., Gao, Y., Chua, T.: Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: ACM MM, pp. 33–42 (2013)
4.
Zurück zum Zitat Zhang, H., Shen, F., Liu, W., He, X., Luan, H., Chua, T.: Discrete collaborative filtering. In: SIGIR, pp. 325–334 (2016) Zhang, H., Shen, F., Liu, W., He, X., Luan, H., Chua, T.: Discrete collaborative filtering. In: SIGIR, pp. 325–334 (2016)
5.
Zurück zum Zitat Zhuang, Y., Yang, Y., Wu, F.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimedia 10, 221–229 (2008)CrossRef Zhuang, Y., Yang, Y., Wu, F.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimedia 10, 221–229 (2008)CrossRef
6.
Zurück zum Zitat Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia 10, 437–446 (2008)CrossRef Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia 10, 437–446 (2008)CrossRef
7.
Zurück zum Zitat Xu, X., Shimada, A., Taniguchi, R., He, L.: Coupled dictionary learning and feature mapping for cross-modal retrieval. In: ICME, pp. 1–6 (2015) Xu, X., Shimada, A., Taniguchi, R., He, L.: Coupled dictionary learning and feature mapping for cross-modal retrieval. In: ICME, pp. 1–6 (2015)
8.
Zurück zum Zitat Xu, X., Shen, F., Yang, Y., Shen, H.T., Li, X.: Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Proces. 26, 2494–2507 (2017)MathSciNetCrossRef Xu, X., Shen, F., Yang, Y., Shen, H.T., Li, X.: Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Proces. 26, 2494–2507 (2017)MathSciNetCrossRef
9.
Zurück zum Zitat Hardoon, D.R., Szedmák, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16, 2639–2664 (2004)CrossRefMATH Hardoon, D.R., Szedmák, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16, 2639–2664 (2004)CrossRefMATH
10.
Zurück zum Zitat Ranjan, V., Rasiwasia, N., Jawahar, C.V.: Multi-label cross-modal retrieval. In: ICCV, pp. 4094–4102 (2015) Ranjan, V., Rasiwasia, N., Jawahar, C.V.: Multi-label cross-modal retrieval. In: ICCV, pp. 4094–4102 (2015)
11.
Zurück zum Zitat Gong, Y., Ke, Q., Isard, M., Lazebnik, S.: A multi-view embedding space for modeling internet images, tags, and their semantics. Int. J. Comput. Vis. 106, 210–233 (2014)CrossRef Gong, Y., Ke, Q., Isard, M., Lazebnik, S.: A multi-view embedding space for modeling internet images, tags, and their semantics. Int. J. Comput. Vis. 106, 210–233 (2014)CrossRef
12.
Zurück zum Zitat Hardoon, D.R., Shawe-Taylor, J.: KCCA for different level precision in content-based image retrieval. In: CIBM (2003) Hardoon, D.R., Shawe-Taylor, J.: KCCA for different level precision in content-based image retrieval. In: CIBM (2003)
13.
Zurück zum Zitat Wang, K., He, R., Wang, L., Wang, W., Tan, T.: Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38, 2010–2023 (2016)CrossRef Wang, K., He, R., Wang, L., Wang, W., Tan, T.: Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38, 2010–2023 (2016)CrossRef
14.
Zurück zum Zitat Sharma, A., Kumar, A., Daume, H., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: CVPR, pp. 2160–2167 (2012) Sharma, A., Kumar, A., Daume, H., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: CVPR, pp. 2160–2167 (2012)
15.
Zurück zum Zitat Jia, Y., Salzmann, M., Darrell, T.: Learning cross-modality similarity for multinomial data. In: ICCV, pp. 2407–2414 (2011) Jia, Y., Salzmann, M., Darrell, T.: Learning cross-modality similarity for multinomial data. In: ICCV, pp. 2407–2414 (2011)
16.
Zurück zum Zitat Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. In: NIPS, pp. 2231–2239 (2012) Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. In: NIPS, pp. 2231–2239 (2012)
17.
Zurück zum Zitat Mignon, A., Jurie, F.: CMML: a new metric learning approach for cross modal matching, pp. 1–14 (2012) Mignon, A., Jurie, F.: CMML: a new metric learning approach for cross modal matching, pp. 1–14 (2012)
18.
Zurück zum Zitat Quadrianto, N., Lampert, C.H.: Learning multi-view neighborhood preserving projections. In: ICML, 425–432 (2011) Quadrianto, N., Lampert, C.H.: Learning multi-view neighborhood preserving projections. In: ICML, 425–432 (2011)
19.
Zurück zum Zitat Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., Mikolov, T.: Devise: a deep visual-semantic embedding model. In: NIPS, pp. 2121–2129 (2013) Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., Mikolov, T.: Devise: a deep visual-semantic embedding model. In: NIPS, pp. 2121–2129 (2013)
20.
21.
Zurück zum Zitat Rasiwasia, N., Moreno, P.J., Vasconcelos, N.: Bridging the gap: query by semantic example. IEEE Trans. Multimedia 9, 923–938 (2007)CrossRef Rasiwasia, N., Moreno, P.J., Vasconcelos, N.: Bridging the gap: query by semantic example. IEEE Trans. Multimedia 9, 923–938 (2007)CrossRef
22.
Zurück zum Zitat Hwang, S.J., Grauman, K.: Reading between the lines: object localization using implicit cues from image tags. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1145–1158 (2012)CrossRef Hwang, S.J., Grauman, K.: Reading between the lines: object localization using implicit cues from image tags. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1145–1158 (2012)CrossRef
23.
Zurück zum Zitat Simon, M., Rodner, E., Denzler, J.: Imagenet pre-trained models with batch normalization. CoRR (2016) Simon, M., Rodner, E., Denzler, J.: Imagenet pre-trained models with batch normalization. CoRR (2016)
24.
Zurück zum Zitat Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G.R.G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: MM, pp. 251–260 (2010) Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G.R.G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: MM, pp. 251–260 (2010)
25.
Zurück zum Zitat Li, A., Shan, S., Chen, X., Gao, W.: Cross-pose face recognition based on partial least squares. Pattern Recogn. Lett. 32, 1948–1955 (2011)CrossRef Li, A., Shan, S., Chen, X., Gao, W.: Cross-pose face recognition based on partial least squares. Pattern Recogn. Lett. 32, 1948–1955 (2011)CrossRef
26.
Zurück zum Zitat Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. In: Neural Computation, pp. 1247–1283 (2000) Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. In: Neural Computation, pp. 1247–1283 (2000)
27.
Zurück zum Zitat Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for cross-modal matching. In: ICCV, pp. 2088–2095 (2013) Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for cross-modal matching. In: ICCV, pp. 2088–2095 (2013)
28.
Zurück zum Zitat Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: MM, pp. 604–611 (2003) Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: MM, pp. 604–611 (2003)
29.
Zurück zum Zitat Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: MM, pp. 7–16 (2014) Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: MM, pp. 7–16 (2014)
30.
Zurück zum Zitat Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. Circ. Syst. Video Technol. 24, 965–978 (2014)CrossRef Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. Circ. Syst. Video Technol. 24, 965–978 (2014)CrossRef
31.
Zurück zum Zitat Kang, C., Xiang, S., Liao, S., Xu, C., Pan, C.: Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans. Multimedia, 17, 370–381 (2015) Kang, C., Xiang, S., Liao, S., Xu, C., Pan, C.: Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans. Multimedia, 17, 370–381 (2015)
32.
Zurück zum Zitat Peng, Y., Huang, X., Qi, J.: Cross-media shared representation by hierarchical learning with multiple deep networks. IJCA I, 3846–3853 (2016) Peng, Y., Huang, X., Qi, J.: Cross-media shared representation by hierarchical learning with multiple deep networks. IJCA I, 3846–3853 (2016)
Metadaten
Titel
Domain Invariant Subspace Learning for Cross-Modal Retrieval
verfasst von
Chenlu Liu
Xing Xu
Yang Yang
Huimin Lu
Fumin Shen
Yanli Ji
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-73600-6_9

Neuer Inhalt