Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 12/2023

13.07.2023 | Original Article

Multi-scale fusion transformer based weakly supervised hashing learning for instance retrieval

verfasst von: Yuanhai Lv, Chen Jiao, Wanqing Zhao, Wei Zhao, Ziyu Guan, Xiaofei He

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 12/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Instance retrieval is concerned with obtaining representations of instances (objects) in images and using them for similarity comparisons between instances. However, most methods require instance-level categories to train the model, which increases the burden of annotation. Along with the advancement of convolutional neural networks and transformers in computer vision, in this work, we propose a hierarchical with a spatial pyramidal structure for weakly supervised multi-instance hash learning. It merges the advantages of local and multi-scale perception on CNN with the global field of view on Transformer. Further, it leverages the principle of multi-instance learning, allowing the proposed model to implement an instance-level hash mapping capability in a weakly supervised learning manner. The experimental results on three public datasets achieved more improved results compared to the typical methods, validating the effectiveness of the proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2016) Netvlad: Cnn architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5297–5307 Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2016) Netvlad: Cnn architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5297–5307
2.
Zurück zum Zitat Bay H, Tuytelaars T, Gool LV (2006) Surf: speeded up robust features. In: European Conference on computer vision, pp 404–417. Springer Bay H, Tuytelaars T, Gool LV (2006) Surf: speeded up robust features. In: European Conference on computer vision, pp 404–417. Springer
3.
Zurück zum Zitat Bello I, Zoph B, Vaswani A, Shlens J, Le QV (2019) Attention augmented convolutional networks. In: Proceedings of the IEEE/CVF International Conference on computer vision, pp 3286–3295 Bello I, Zoph B, Vaswani A, Shlens J, Le QV (2019) Attention augmented convolutional networks. In: Proceedings of the IEEE/CVF International Conference on computer vision, pp 3286–3295
4.
Zurück zum Zitat Cao J, Liu L, Wang P, Huang Z, Shen C, Shen HT (2016) Where to focus: query adaptive matching for instance retrieval using convolutional feature maps. arXiv preprint arXiv:1606.06811 Cao J, Liu L, Wang P, Huang Z, Shen C, Shen HT (2016) Where to focus: query adaptive matching for instance retrieval using convolutional feature maps. arXiv preprint arXiv:​1606.​06811
5.
Zurück zum Zitat Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on computer vision, pp 213–229. Springer Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on computer vision, pp 213–229. Springer
6.
Zurück zum Zitat Cheng MM, Zhang Z, Lin WY, Torr P (2014) Bing: binarized normed gradients for objectness estimation at 300fps. In: IEEE Conference on computer vision and pattern recognition, pp 3286–3293 Cheng MM, Zhang Z, Lin WY, Torr P (2014) Bing: binarized normed gradients for objectness estimation at 300fps. In: IEEE Conference on computer vision and pattern recognition, pp 3286–3293
7.
Zurück zum Zitat El-Nouby A, Neverova N, Laptev I, Jégou H (2021) Training vision transformers for image retrieval. arXiv preprint arXiv:2102.05644 El-Nouby A, Neverova N, Laptev I, Jégou H (2021) Training vision transformers for image retrieval. arXiv preprint arXiv:​2102.​05644
8.
Zurück zum Zitat Everingham M, Zisserman A, Williams CK, Van Gool L, Allan M, Bishop CM, Chapelle O, Dalal N, Deselaers T, Dorkó G, et al (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88, pp. 303–338. Everingham M, Zisserman A, Williams CK, Van Gool L, Allan M, Bishop CM, Chapelle O, Dalal N, Deselaers T, Dorkó G, et al (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88, pp. 303–338.
9.
Zurück zum Zitat Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on artificial intelligence and statistics, pp 249–256. JMLR Workshop and Conference Proceedings Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on artificial intelligence and statistics, pp 249–256. JMLR Workshop and Conference Proceedings
10.
Zurück zum Zitat Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: European Conference on computer vision, pp 392–407. Springer Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: European Conference on computer vision, pp 392–407. Springer
11.
Zurück zum Zitat He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on computer vision, pp 2961–2969 He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on computer vision, pp 2961–2969
12.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 770–778
13.
Zurück zum Zitat Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670MathSciNetCrossRefMATH Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670MathSciNetCrossRefMATH
14.
Zurück zum Zitat Hong C, Yu J, Zhang J, Jin X, Lee KH (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inf 15(7):3952–3961CrossRef Hong C, Yu J, Zhang J, Jin X, Lee KH (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inf 15(7):3952–3961CrossRef
16.
Zurück zum Zitat Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: 2010 IEEE Computer Society Conference on computer vision and pattern recognition, pp 3304–3311. IEEE Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: 2010 IEEE Computer Society Conference on computer vision and pattern recognition, pp 3304–3311. IEEE
17.
Zurück zum Zitat Jia Y, Gu Z, Jiang Z, Gao C, Yang J (2023) Persistent graph stream summarization for real-time graph analytics. In: World Wide Web, pp 1–21 Jia Y, Gu Z, Jiang Z, Gao C, Yang J (2023) Persistent graph stream summarization for real-time graph analytics. In: World Wide Web, pp 1–21
18.
Zurück zum Zitat Jiménez A, Alvarez JM, Giró Nieto X (2017) Class-weighted convolutional features for visual instance search. In: Proceedings of the 28th British Machine Vision Conference 2017, pp 1–12 Jiménez A, Alvarez JM, Giró Nieto X (2017) Class-weighted convolutional features for visual instance search. In: Proceedings of the 28th British Machine Vision Conference 2017, pp 1–12
19.
Zurück zum Zitat Lee DC, Ke Q, Isard M (2010) Partition min-hash for partial duplicate image discovery. In: European Conference on computer vision, pp 648–662 Lee DC, Ke Q, Isard M (2010) Partition min-hash for partial duplicate image discovery. In: European Conference on computer vision, pp 648–662
20.
Zurück zum Zitat Li F, Liu R (2015) Multi-graph multi-instance learning with soft label consistency for object-based image retrieval. In: 2015 IEEE International Conference on Multimedia and Expo (ICME), pp 1–6. IEEE Li F, Liu R (2015) Multi-graph multi-instance learning with soft label consistency for object-based image retrieval. In: 2015 IEEE International Conference on Multimedia and Expo (ICME), pp 1–6. IEEE
21.
Zurück zum Zitat Li WJ, Wang S, Kang WC (2015) Feature learning based deep supervised hashing with pairwise labels. arXiv preprint arXiv:1511.03855 Li WJ, Wang S, Kang WC (2015) Feature learning based deep supervised hashing with pairwise labels. arXiv preprint arXiv:​1511.​03855
22.
Zurück zum Zitat Lin J, Zhan Y, Zhao WL (2021) Instance search based on weakly supervised feature learning. Neurocomputing 424:117–124CrossRef Lin J, Zhan Y, Zhao WL (2021) Instance search based on weakly supervised feature learning. Neurocomputing 424:117–124CrossRef
23.
Zurück zum Zitat Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2117–2125 Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2117–2125
24.
Zurück zum Zitat Liu GH, Yang JY (2023) Exploiting deep textures for image retrieval. Int J Mach Learn Cybern 14(2):483–494CrossRef Liu GH, Yang JY (2023) Exploiting deep textures for image retrieval. Int J Mach Learn Cybern 14(2):483–494CrossRef
25.
Zurück zum Zitat Liu H, Tian Y, Yang Y, Pang L, Huang T (2016) Deep relative distance learning: tell the difference between similar vehicles. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2167–2175 Liu H, Tian Y, Yang Y, Pang L, Huang T (2016) Deep relative distance learning: tell the difference between similar vehicles. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2167–2175
26.
Zurück zum Zitat Liu J, Chen Y, Huang X, Li J, Min G (2023) Gnn-based long and short term preference modeling for next-location prediction. Inf Sci 629:1–14CrossRef Liu J, Chen Y, Huang X, Li J, Min G (2023) Gnn-based long and short term preference modeling for next-location prediction. Inf Sci 629:1–14CrossRef
27.
Zurück zum Zitat Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1096–1104 Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1096–1104
28.
Zurück zum Zitat Loshchilov I, Hutter F (2018) Decoupled weight decay regularization. In: International Conference on learning representations Loshchilov I, Hutter F (2018) Decoupled weight decay regularization. In: International Conference on learning representations
29.
Zurück zum Zitat Lowe D (2004) Distinctive image features from scale invariant keypoints. Int J Comput Vis 60:91–110 Lowe D (2004) Distinctive image features from scale invariant keypoints. Int J Comput Vis 60:91–110
30.
Zurück zum Zitat Mohedano E, McGuinness K, Giró-i Nieto X, O’Connor NE (2018) Saliency weighted convolutional features for instance search. In: 2018 international Conference on content-based multimedia indexing (CBMI), pp 1–6. IEEE Mohedano E, McGuinness K, Giró-i Nieto X, O’Connor NE (2018) Saliency weighted convolutional features for instance search. In: 2018 international Conference on content-based multimedia indexing (CBMI), pp 1–6. IEEE
31.
Zurück zum Zitat Nguyen VA, Do MN (2016) Deep learning based supervised hashing for efficient image retrieval. In: IEEE International Conference on Multimedia and Expo (ICME), pp 1–6. IEEE Nguyen VA, Do MN (2016) Deep learning based supervised hashing for efficient image retrieval. In: IEEE International Conference on Multimedia and Expo (ICME), pp 1–6. IEEE
32.
Zurück zum Zitat Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, Tran D (2018) Image transformer. In: ICML Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, Tran D (2018) Image transformer. In: ICML
33.
Zurück zum Zitat Rahmani R, Goldman SA (2006) Missl: multiple-instance semi-supervised learning. In: Proceedings of the 23rd International Conference on machine learning, pp 705–712 Rahmani R, Goldman SA (2006) Missl: multiple-instance semi-supervised learning. In: Proceedings of the 23rd International Conference on machine learning, pp 705–712
34.
Zurück zum Zitat Rahmani R, Goldman SA (2016) Sgdr: Stochastic gradient descent with warm restarts. In: Proceedings of International Conference on learning representation Rahmani R, Goldman SA (2016) Sgdr: Stochastic gradient descent with warm restarts. In: Proceedings of International Conference on learning representation
35.
Zurück zum Zitat Rahmani R, Goldman SA, Zhang H, Cholleti SR, Fritts JE (2008) Localized content-based image retrieval. IEEE Trans Pattern Anal Mach Intell 30(11):1902–1912CrossRef Rahmani R, Goldman SA, Zhang H, Cholleti SR, Fritts JE (2008) Localized content-based image retrieval. IEEE Trans Pattern Anal Mach Intell 30(11):1902–1912CrossRef
36.
Zurück zum Zitat Rahmani R, Goldman SA, Zhang H, Krettek J, Fritts JE (2005) Localized content based image retrieval. In: Proceedings of the 7th ACM SIGMM International Workshop on Multimedia information retrieval, pp 227–236 Rahmani R, Goldman SA, Zhang H, Krettek J, Fritts JE (2005) Localized content based image retrieval. In: Proceedings of the 7th ACM SIGMM International Workshop on Multimedia information retrieval, pp 227–236
37.
Zurück zum Zitat Russell BC, Freeman WT, Efros AA, Sivic J, Zisserman A (2006) Using multiple segmentations to discover objects and their extent in image collections. In: 2006 IEEE Computer Society Conference on computer vision and pattern recognition (CVPR’06). vol. 2, pp 1605–1614. IEEE Russell BC, Freeman WT, Efros AA, Sivic J, Zisserman A (2006) Using multiple segmentations to discover objects and their extent in image collections. In: 2006 IEEE Computer Society Conference on computer vision and pattern recognition (CVPR’06). vol. 2, pp 1605–1614. IEEE
38.
Zurück zum Zitat Salvador A, Giró-i Nieto X, Marqués F, Satoh S (2016) Faster r-cnn features for instance search. In: Proceedings of the IEEE Conference on computer vision and pattern recognition workshops, pp 9–16 Salvador A, Giró-i Nieto X, Marqués F, Satoh S (2016) Faster r-cnn features for instance search. In: Proceedings of the IEEE Conference on computer vision and pattern recognition workshops, pp 9–16
39.
Zurück zum Zitat Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Computer Vision, IEEE International Conference on. vol. 3, pp 1470–1470. IEEE Computer Society Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Computer Vision, IEEE International Conference on. vol. 3, pp 1470–1470. IEEE Computer Society
40.
Zurück zum Zitat Smeulders AW, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380CrossRef Smeulders AW, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380CrossRef
41.
Zurück zum Zitat Song X, Li J, Cai T, Yang S, Yang T, Liu C (2022) A survey on deep learning based knowledge tracing. Knowl-Based Syst 258:110036CrossRef Song X, Li J, Cai T, Yang S, Yang T, Liu C (2022) A survey on deep learning based knowledge tracing. Knowl-Based Syst 258:110036CrossRef
42.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, 30 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, 30
43.
Zurück zum Zitat Vieux R, Benois-Pineau J, Domenger JP (2012) Content based image retrieval using bag-of-regions. In: International Conference on multimedia modeling, pp 507–517. Springer Vieux R, Benois-Pineau J, Domenger JP (2012) Content based image retrieval using bag-of-regions. In: International Conference on multimedia modeling, pp 507–517. Springer
44.
Zurück zum Zitat Wang X, Yan Y, Tang P, Bai X, Liu W (2018) Revisiting multiple instance neural networks. Pattern Recogn 74:15–24CrossRef Wang X, Yan Y, Tang P, Bai X, Liu W (2018) Revisiting multiple instance neural networks. Pattern Recogn 74:15–24CrossRef
45.
Zurück zum Zitat Yang HF, Lin K, Chen CS (2017) Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):437–451CrossRef Yang HF, Lin K, Chen CS (2017) Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):437–451CrossRef
46.
Zurück zum Zitat Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578CrossRef Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578CrossRef
47.
Zurück zum Zitat Zhan Y, Zhao WL (2021) Instance search via instance level segmentation and feature representation. J Vis Commun Image Represent 79:103253CrossRef Zhan Y, Zhao WL (2021) Instance search via instance level segmentation and feature representation. J Vis Commun Image Represent 79:103253CrossRef
48.
Zurück zum Zitat Zhao F, Huang Y, Wang L, Tan T (2015) Deep semantic ranking based hashing for multi-label image retrieval. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1556–1564 Zhao F, Huang Y, Wang L, Tan T (2015) Deep semantic ranking based hashing for multi-label image retrieval. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1556–1564
49.
Zurück zum Zitat Zhao W, Guan Z, Luo H, Peng J, Fan J (2017) Deep multiple instance hashing for object-based image retrieval. In: IJCAI. pp. 3504–3510 Zhao W, Guan Z, Luo H, Peng J, Fan J (2017) Deep multiple instance hashing for object-based image retrieval. In: IJCAI. pp. 3504–3510
50.
Zurück zum Zitat Zhao W, Guan Z, Luo H, Peng J, Fan J (2021) Deep multiple instance hashing for fast multi-object image search. IEEE Trans Image Process 30:7995–8007MathSciNetCrossRef Zhao W, Guan Z, Luo H, Peng J, Fan J (2021) Deep multiple instance hashing for fast multi-object image search. IEEE Trans Image Process 30:7995–8007MathSciNetCrossRef
51.
Zurück zum Zitat Zhou K, Liu Y, Song J, Yan L, Zou F, Shen F (2015) Deep self-taught hashing for image retrieval. In: ACM International Conference on multimedia, pp 1215–1218 Zhou K, Liu Y, Song J, Yan L, Zou F, Shen F (2015) Deep self-taught hashing for image retrieval. In: ACM International Conference on multimedia, pp 1215–1218
52.
Zurück zum Zitat Zhu J, Shu Y, Zhang J, Wang X, Wu S (2022) Triplet-object loss for large scale deep image retrieval. Int J Mach Learn Cybern 13(1):1–9CrossRef Zhu J, Shu Y, Zhang J, Wang X, Wu S (2022) Triplet-object loss for large scale deep image retrieval. Int J Mach Learn Cybern 13(1):1–9CrossRef
Metadaten
Titel
Multi-scale fusion transformer based weakly supervised hashing learning for instance retrieval
verfasst von
Yuanhai Lv
Chen Jiao
Wanqing Zhao
Wei Zhao
Ziyu Guan
Xiaofei He
Publikationsdatum
13.07.2023
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 12/2023
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-023-01907-5

Weitere Artikel der Ausgabe 12/2023

International Journal of Machine Learning and Cybernetics 12/2023 Zur Ausgabe

Neuer Inhalt