Top

Multimedia Systems

Published in:

23-09-2022 | Regular Paper

A triple fusion model for cross-modal deep hashing retrieval

Authors: Hufei Wang, Kaiqiang Zhao, Dexin Zhao

Published in: Multimedia Systems | Issue 1/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In the field of resource retrieval, deep cross-modal retrieval is attracting increasing attention. It has a lower storage capacity and faster retrieval speed. However, most of the current methods put their attention on the semantic similarity between hash codes. They ignore the similarity between features extracted by the model from different modalities, which leads them to achieve sub-optimal results. In addition, the correlation between different modalities is difficult to exploit adequately. Therefore, in order to enhance the information correlation between different modalities, a triple fusion model for cross-modal deep hashing retrieval (SSTFH) is proposed in this paper. To weaken the missing feature information when features pass through the fully connected layer, we designed a triple fusion strategy. Specifically, the first fusion and the second fusion are performed for images and text respectively, to obtain pattern-specific features. The third fusion is used to obtain more relevant semantic features. In addition, we attempt to use shared semantic information from semantic features to guide the model in extracting correlations between different modalities. Comprehensive experiments have been conducted on the benchmark IAPR TC-12 and MS COCO datasets. On MS COCO, our approach outperforms all the deep baselines by an average of 7.74\({\%}\) on the image-to-text task, and by 8.72\({\%}\) on the text-to-image task. On IAPR TC-12, our approach averagely improves image retrieval by 7.07\({\%}\) and text retrieval by 4.88\({\%}\).

previous article Multi-view region proposal network predictive learning for tracking

next article Fake COVID-19 videos detector based on frames and audio watermarking

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Sun, C., Piwowarski, B., et al.: Supervised hierarchical cross-modal hashing. In: Piwowarski, B. (ed.) Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR 2019, Paris, France, July 21–25, 2019, pp. 725–734. ACM, New York (2019)

Yang, E., Singh, S.P., Markovitch, S.: Pairwise relationship guided deep hashing for cross-modal retrieval. In: Singh, S.P., Markovitch, S. (eds.) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, California, USA, pp. 1618–1625. AAAI Press, London (2017)

Deng, C., Chen, Z., Liu, X., Gao, X., Tao, D.: Triplet-based deep hashing network for cross-modal retrieval. arXiv:1904.02449 (2019)

Jiang, Q.-Y., Li, W.-J.: Deep cross-modal hashing. IEEE Conf. Comput. Vis. Pattern Recognit. (2017). https://doi.org/10.1109/CVPR.2017.348CrossRef

Jin, L., et al.: Deep semantic-preserving ordinal hashing for cross-modal similarity search. IEEE Trans. Neural Netw. Learn. Syst. 30(5), 1429–1440 (2019). https://doi.org/10.1109/TNNLS.2018.2869601MathSciNetCrossRef

Ye, Z., Peng, Y., Boll, S., et al.: Multi-scale correlation for sequential cross-modal hashing learning. In: Boll, S. (ed.) 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, October 22–26, 2018, pp. 852–860. ACM, New York (2018)

Zhang, X., Lai, H., Feng, J., Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y.: Attention-aware deep adversarial hashing for cross-modal retrieval. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018–15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XV. Lecture Notes in Computer Science, 11219th edn., pp. 614–629. Springer, Berlin (2018)

Gu, W., El-Saddik, A., et al.: Adversary guided asymmetric hashing for cross-modal retrieval. In: El-Saddik, A. (ed.) Proceedings of the 2019 on International Conference on Multimedia Retrieval, ICMR 2019, Ottawa, ON, Canada, June 10–13, 2019, pp. 159–167. ACM, New York (2019)

Peng, H., He, J., Chen, S., Wang, Y., Qiao, Y.: Dual-supervised attention network for deep cross-modal hashing. Pattern Recognit. Lett. 128, 333–339 (2019). https://doi.org/10.1016/j.patrec.2019.08.032CrossRef

10.

Zhang, P., Li, C., Liu, M., Nie, L., Xu, X., Liu, Q., et al.: Semi-relaxation supervised hashing for cross-modal retrieval. In: Liu, Q. (ed.) Proceedings of the 2017 ACM on Multimedia Conference, MM 2017, Mountain View, CA, USA, October 23–27, 2017, pp. 1762–1770. ACM, New York (2017)

11.

Mandal, D., Chaudhury, K.N., Biswas, S.: Generalized semantic preserving hashing for cross-modal retrieval. IEEE Trans. Image Process. 28(1), 102–112 (2019). https://doi.org/10.1109/TIP.2018.2863040MathSciNetCrossRef

12.

Li, C., et al.: Self-supervised adversarial hashing networks for cross-modal retrieval. Comput. Vis. Found. (2018). https://doi.org/10.1109/CVPR.2018.00446CrossRef

13.

Ji, Z., Yao, W., Wei, W., Song, H., Pi, H.: Deep multi-level semantic hashing for cross-modal retrieval. IEEE Access 7, 23667–23674 (2019). https://doi.org/10.1109/ACCESS.2019.2899536CrossRef

14.

Zou, X., Wang, X., Bakker, E.M., Wu, S.: Multi-label semantics preserving based deep cross-modal hashing. Signal Process. Image Commun. 93, 116131 (2021). https://doi.org/10.1016/j.image.2020.116131CrossRef

15.

Qiang, H., Wan, Y., Xiang, L., Meng, X.: Deep semantic similarity adversarial hashing for cross-modal retrieval. Neurocomputing 400, 24–33 (2020). https://doi.org/10.1016/j.neucom.2020.03.032CrossRef

16.

Yan, C., Bai, X., Wang, S., Zhou, J., Hancock, E.R.: Cross-modal hashing with semantic deep embedding. Neurocomputing 337, 58–66 (2019). https://doi.org/10.1016/j.neucom.2019.01.040CrossRef

17.

Zhang, D., Li, W., Brodley, C.E., Stone, P.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: Brodley, C.E., Stone, P. (eds.) Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27–31, 2014, Québec City, Québec, Canada, pp. 2177–2183. AAAI Press, New York (2014)

18.

Lin, Z., Ding, G., Hu, M., Wang, J.: Semantics-preserving hashing for cross-view retrieval. IEEE Comput. Soc. (2015). https://doi.org/10.1109/CVPR.2015.7299011CrossRef

19.

Cao, Y., Liu, B., Long, M., Wang, J., Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y.: Cross-modal hamming hashing. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018–15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part I. Lecture Notes in Computer Science, 11205th edn., pp. 207–223. Springer, Berlin (2018)

20.

Wang, X., Zou, X., Bakker, E.M., Wu, S.: Self-constraining and attention-based hashing network for bit-scalable cross-modal retrieval. Neurocomputing 400, 255–271 (2020). https://doi.org/10.1016/j.neucom.2020.03.019CrossRef

21.

Chen, S., Wu, S., Wang, L., Yu, Z.: Self-attention and adversary learning deep hashing network for cross-modal retrieval. Comput. Electr. Eng. 93, 107262 (2021). https://doi.org/10.1016/j.compeleceng.2021.107262CrossRef

22.

Song, J., Yu, Q., Song, Y., Xiang, T., Hospedales, T.M.: Deep spatial-semantic attention for fine-grained sketch-based image retrieval. IEEE Comput. Soc. (2017). https://doi.org/10.1109/ICCV.2017.592CrossRef

23.

Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (2018). https://doi.org/10.1109/CVPR.2018.00636CrossRef

24.

Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031CrossRef

25.

Vaswani, A., Guyon, I., et al.: Attention is all you need. In: Guyon, I. (ed.) Advances in neural information processing systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008. ACM, New York (2017)

26.

Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding, 4171–4186. Association for Computational Linguistics, Minneapolis (2019)

27.

Escalante, H.J., et al.: The segmented and annotated IAPR TC-12 benchmark. Comput. Vis. Image Underst. 114(4), 419–428 (2010). https://doi.org/10.1016/j.cviu.2009.03.008CrossRef

28.

Lin, T., Fleet, D.J., Pajdla, T., Schiele, B., Tuytelaars, T.: Microsoft COCO: common objects in context. In: Fleet, D.J., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014–13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V. Lecture Notes in Computer Science, 8693rd edn., pp. 740–755. Springer, Berlin (2014)

Title: A triple fusion model for cross-modal deep hashing retrieval
Authors: Hufei Wang
Kaiqiang Zhao
Dexin Zhao
Publication date: 23-09-2022
Publisher: Springer Berlin Heidelberg
Published in: Multimedia Systems / Issue 1/2023
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI: https://doi.org/10.1007/s00530-022-01005-6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2023

Fast bilateral filter with spatial subsampling

Generalized attention-based deep multi-instance learning

Correction to: Abusive language detection from social media comments using conventional machine learning and deep learning approaches

Multiscale object detection based on channel and data enhancement at construction sites

Dual-branch network with memory for video anomaly detection

Real-time anomaly detection on surveillance video with two-stream spatio-temporal generative model