Skip to main content
Top
Published in: Multimedia Systems 1/2023

23-09-2022 | Regular Paper

A triple fusion model for cross-modal deep hashing retrieval

Authors: Hufei Wang, Kaiqiang Zhao, Dexin Zhao

Published in: Multimedia Systems | Issue 1/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the field of resource retrieval, deep cross-modal retrieval is attracting increasing attention. It has a lower storage capacity and faster retrieval speed. However, most of the current methods put their attention on the semantic similarity between hash codes. They ignore the similarity between features extracted by the model from different modalities, which leads them to achieve sub-optimal results. In addition, the correlation between different modalities is difficult to exploit adequately. Therefore, in order to enhance the information correlation between different modalities, a triple fusion model for cross-modal deep hashing retrieval (SSTFH) is proposed in this paper. To weaken the missing feature information when features pass through the fully connected layer, we designed a triple fusion strategy. Specifically, the first fusion and the second fusion are performed for images and text respectively, to obtain pattern-specific features. The third fusion is used to obtain more relevant semantic features. In addition, we attempt to use shared semantic information from semantic features to guide the model in extracting correlations between different modalities. Comprehensive experiments have been conducted on the benchmark IAPR TC-12 and MS COCO datasets. On MS COCO, our approach outperforms all the deep baselines by an average of 7.74\({\%}\) on the image-to-text task, and by 8.72\({\%}\) on the text-to-image task. On IAPR TC-12, our approach averagely improves image retrieval by 7.07\({\%}\) and text retrieval by 4.88\({\%}\).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Sun, C., Piwowarski, B., et al.: Supervised hierarchical cross-modal hashing. In: Piwowarski, B. (ed.) Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR 2019, Paris, France, July 21–25, 2019, pp. 725–734. ACM, New York (2019) Sun, C., Piwowarski, B., et al.: Supervised hierarchical cross-modal hashing. In: Piwowarski, B. (ed.) Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR 2019, Paris, France, July 21–25, 2019, pp. 725–734. ACM, New York (2019)
2.
go back to reference Yang, E., Singh, S.P., Markovitch, S.: Pairwise relationship guided deep hashing for cross-modal retrieval. In: Singh, S.P., Markovitch, S. (eds.) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, California, USA, pp. 1618–1625. AAAI Press, London (2017) Yang, E., Singh, S.P., Markovitch, S.: Pairwise relationship guided deep hashing for cross-modal retrieval. In: Singh, S.P., Markovitch, S. (eds.) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, California, USA, pp. 1618–1625. AAAI Press, London (2017)
3.
6.
go back to reference Ye, Z., Peng, Y., Boll, S., et al.: Multi-scale correlation for sequential cross-modal hashing learning. In: Boll, S. (ed.) 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, October 22–26, 2018, pp. 852–860. ACM, New York (2018) Ye, Z., Peng, Y., Boll, S., et al.: Multi-scale correlation for sequential cross-modal hashing learning. In: Boll, S. (ed.) 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, October 22–26, 2018, pp. 852–860. ACM, New York (2018)
7.
go back to reference Zhang, X., Lai, H., Feng, J., Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y.: Attention-aware deep adversarial hashing for cross-modal retrieval. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018–15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XV. Lecture Notes in Computer Science, 11219th edn., pp. 614–629. Springer, Berlin (2018) Zhang, X., Lai, H., Feng, J., Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y.: Attention-aware deep adversarial hashing for cross-modal retrieval. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018–15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XV. Lecture Notes in Computer Science, 11219th edn., pp. 614–629. Springer, Berlin (2018)
8.
go back to reference Gu, W., El-Saddik, A., et al.: Adversary guided asymmetric hashing for cross-modal retrieval. In: El-Saddik, A. (ed.) Proceedings of the 2019 on International Conference on Multimedia Retrieval, ICMR 2019, Ottawa, ON, Canada, June 10–13, 2019, pp. 159–167. ACM, New York (2019) Gu, W., El-Saddik, A., et al.: Adversary guided asymmetric hashing for cross-modal retrieval. In: El-Saddik, A. (ed.) Proceedings of the 2019 on International Conference on Multimedia Retrieval, ICMR 2019, Ottawa, ON, Canada, June 10–13, 2019, pp. 159–167. ACM, New York (2019)
10.
go back to reference Zhang, P., Li, C., Liu, M., Nie, L., Xu, X., Liu, Q., et al.: Semi-relaxation supervised hashing for cross-modal retrieval. In: Liu, Q. (ed.) Proceedings of the 2017 ACM on Multimedia Conference, MM 2017, Mountain View, CA, USA, October 23–27, 2017, pp. 1762–1770. ACM, New York (2017) Zhang, P., Li, C., Liu, M., Nie, L., Xu, X., Liu, Q., et al.: Semi-relaxation supervised hashing for cross-modal retrieval. In: Liu, Q. (ed.) Proceedings of the 2017 ACM on Multimedia Conference, MM 2017, Mountain View, CA, USA, October 23–27, 2017, pp. 1762–1770. ACM, New York (2017)
17.
go back to reference Zhang, D., Li, W., Brodley, C.E., Stone, P.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: Brodley, C.E., Stone, P. (eds.) Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27–31, 2014, Québec City, Québec, Canada, pp. 2177–2183. AAAI Press, New York (2014) Zhang, D., Li, W., Brodley, C.E., Stone, P.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: Brodley, C.E., Stone, P. (eds.) Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27–31, 2014, Québec City, Québec, Canada, pp. 2177–2183. AAAI Press, New York (2014)
19.
go back to reference Cao, Y., Liu, B., Long, M., Wang, J., Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y.: Cross-modal hamming hashing. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018–15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part I. Lecture Notes in Computer Science, 11205th edn., pp. 207–223. Springer, Berlin (2018) Cao, Y., Liu, B., Long, M., Wang, J., Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y.: Cross-modal hamming hashing. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018–15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part I. Lecture Notes in Computer Science, 11205th edn., pp. 207–223. Springer, Berlin (2018)
25.
go back to reference Vaswani, A., Guyon, I., et al.: Attention is all you need. In: Guyon, I. (ed.) Advances in neural information processing systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008. ACM, New York (2017) Vaswani, A., Guyon, I., et al.: Attention is all you need. In: Guyon, I. (ed.) Advances in neural information processing systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008. ACM, New York (2017)
26.
go back to reference Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding, 4171–4186. Association for Computational Linguistics, Minneapolis (2019) Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding, 4171–4186. Association for Computational Linguistics, Minneapolis (2019)
28.
go back to reference Lin, T., Fleet, D.J., Pajdla, T., Schiele, B., Tuytelaars, T.: Microsoft COCO: common objects in context. In: Fleet, D.J., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014–13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V. Lecture Notes in Computer Science, 8693rd edn., pp. 740–755. Springer, Berlin (2014) Lin, T., Fleet, D.J., Pajdla, T., Schiele, B., Tuytelaars, T.: Microsoft COCO: common objects in context. In: Fleet, D.J., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014–13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V. Lecture Notes in Computer Science, 8693rd edn., pp. 740–755. Springer, Berlin (2014)
Metadata
Title
A triple fusion model for cross-modal deep hashing retrieval
Authors
Hufei Wang
Kaiqiang Zhao
Dexin Zhao
Publication date
23-09-2022
Publisher
Springer Berlin Heidelberg
Published in
Multimedia Systems / Issue 1/2023
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI
https://doi.org/10.1007/s00530-022-01005-6

Other articles of this Issue 1/2023

Multimedia Systems 1/2023 Go to the issue