Skip to main content
Top

2020 | OriginalPaper | Chapter

Adversarial Query-by-Image Video Retrieval Based on Attention Mechanism

Authors : Ruicong Xu, Li Niu, Liqing Zhang

Published in: MultiMedia Modeling

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The query-by-image video retrieval (QBIVR) is a difficult feature matching task across different modalities. More and more retrieval tasks require indexing the videos containing the activities in the image, which makes extracting meaningful spatio-temporal video features crucial. In this paper, we propose an approach based on adversarial learning, termed Adversarial Image-to-Video (AIV) approach. To capture the temporal pattern of videos, we utilize temporal regions likely to contain activities via fully-convolutional 3D ConvNet features, and then obtain the video bag features by 3D RoI Pooling. To solve mismatch issue with image vector features and identify the importances of information for videos, we add a Multiple Instance Learning (MIL) module to assign different weights to each temporal information in video bags. Moreover, we utilize the triplet loss to distinguish different semantic categorites and support intraclass variability of images and videos. Specially, our AIV proposes modality loss as an adversary to the triplet loss in the adversarial learning. The interplay between two losses jointly bridges the domain gap across different modalities. Extensive experiments on two widely used datasets verify the effectiveness of our proposed methods as compared with other methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference de Araújo, A.F., Chaves, J., Angst, R., Girod, B.: Temporal aggregation for large-scale query-by-image video retrieval. In: ICIP (2015) de Araújo, A.F., Chaves, J., Angst, R., Girod, B.: Temporal aggregation for large-scale query-by-image video retrieval. In: ICIP (2015)
2.
go back to reference Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
3.
go back to reference Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: CVPR (2014) Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: CVPR (2014)
4.
go back to reference Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: MM (2014) Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: MM (2014)
5.
go back to reference Gong, Y., Lazebnik, S.: Iterative quantization: a procrustean approach to learning binary codes. In: CVPR, pp. 817–824 (2011) Gong, Y., Lazebnik, S.: Iterative quantization: a procrustean approach to learning binary codes. In: CVPR, pp. 817–824 (2011)
6.
go back to reference Goodfellow, I.J., et al.: Generative adversarial networks. CoRR abs/1406.2661 (2014) Goodfellow, I.J., et al.: Generative adversarial networks. CoRR abs/1406.2661 (2014)
7.
go back to reference Gorisse, D., et al.: IRIM at TRECVID 2010: semantic indexing and instance search. In: TRECVID (2010) Gorisse, D., et al.: IRIM at TRECVID 2010: semantic indexing and instance search. In: TRECVID (2010)
8.
go back to reference Heo, J., Lee, Y., He, J., Chang, S., Yoon, S.: Spherical hashing. In: CVPR (2012) Heo, J., Lee, Y., He, J., Chang, S., Yoon, S.: Spherical hashing. In: CVPR (2012)
9.
go back to reference Jiang, Z., Rozgic, V., Adali, S.: Learning spatiotemporal features for infrared action recognition with 3D convolutional neural networks. In: CVPR, pp. 309–317 (2017) Jiang, Z., Rozgic, V., Adali, S.: Learning spatiotemporal features for infrared action recognition with 3D convolutional neural networks. In: CVPR, pp. 309–317 (2017)
10.
go back to reference Lin, G., Shen, C., van den Hengel, A.: Supervised hashing using graph cuts and boosted decision trees. CoRR abs/1408.5574 (2014) Lin, G., Shen, C., van den Hengel, A.: Supervised hashing using graph cuts and boosted decision trees. CoRR abs/1408.5574 (2014)
11.
go back to reference Raginsky, M., Lazebnik, S.: Locality-sensitive binary codes from shift-invariant kernels. In: NIPS (2009) Raginsky, M., Lazebnik, S.: Locality-sensitive binary codes from shift-invariant kernels. In: NIPS (2009)
12.
go back to reference Wang, K., He, R., Wang, L., Wang, W., Tan, T.: Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2010–2023 (2016)CrossRef Wang, K., He, R., Wang, L., Wang, W., Tan, T.: Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2010–2023 (2016)CrossRef
13.
go back to reference Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for cross-modal matching. In: ICCV (2013) Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for cross-modal matching. In: ICCV (2013)
14.
go back to reference Xu, H., Das, A., Saenko, K.: R-C3D: region convolutional 3d network for temporal activity detection. In: ICCV (2017) Xu, H., Das, A., Saenko, K.: R-C3D: region convolutional 3d network for temporal activity detection. In: ICCV (2017)
15.
go back to reference Ye, D., Li, Y., Tao, C., Xie, X., Wang, X.: Multiple feature hashing learning for large-scale remote sensing image retrieval. ISPRS Int. J. Geo-Inf. 6(11), 364 (2017)CrossRef Ye, D., Li, Y., Tao, C., Xie, X., Wang, X.: Multiple feature hashing learning for large-scale remote sensing image retrieval. ISPRS Int. J. Geo-Inf. 6(11), 364 (2017)CrossRef
16.
go back to reference You, Q., Cao, L., Jin, H., Luo, J.: Robust visual-textual sentiment analysis: when attention meets tree-structured recursive neural networks. In: Proceedings of the 2016 ACM Conference on Multimedia Conference, MM, pp. 1008–1017 (2016) You, Q., Cao, L., Jin, H., Luo, J.: Robust visual-textual sentiment analysis: when attention meets tree-structured recursive neural networks. In: Proceedings of the 2016 ACM Conference on Multimedia Conference, MM, pp. 1008–1017 (2016)
17.
go back to reference Yu, F.X., Kumar, S., Gong, Y., Chang, S.: Circulant binary embedding. In: ICML (2014) Yu, F.X., Kumar, S., Gong, Y., Chang, S.: Circulant binary embedding. In: ICML (2014)
18.
go back to reference Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. Circ. Syst. Video Techn. 24(6), 965–978 (2014)CrossRef Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. Circ. Syst. Video Techn. 24(6), 965–978 (2014)CrossRef
19.
go back to reference Zhang, D., Li, W.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI (2014) Zhang, D., Li, W.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI (2014)
20.
go back to reference Zhu, C., Huang, Y., Satoh, S.: Multi-image aggregation for better visual object retrieval. In: ICASSP (2014) Zhu, C., Huang, Y., Satoh, S.: Multi-image aggregation for better visual object retrieval. In: ICASSP (2014)
21.
go back to reference Zhu, C., Satoh, S.: Large vocabulary quantization for searching instances from videos. In: ICMR (2012) Zhu, C., Satoh, S.: Large vocabulary quantization for searching instances from videos. In: ICMR (2012)
Metadata
Title
Adversarial Query-by-Image Video Retrieval Based on Attention Mechanism
Authors
Ruicong Xu
Li Niu
Liqing Zhang
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-37731-1_63