nach oben

International Journal of Machine Learning and Cybernetics

Erschienen in:

13.07.2023 | Original Article

Multi-scale fusion transformer based weakly supervised hashing learning for instance retrieval

verfasst von: Yuanhai Lv, Chen Jiao, Wanqing Zhao, Wei Zhao, Ziyu Guan, Xiaofei He

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 12/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Instance retrieval is concerned with obtaining representations of instances (objects) in images and using them for similarity comparisons between instances. However, most methods require instance-level categories to train the model, which increases the burden of annotation. Along with the advancement of convolutional neural networks and transformers in computer vision, in this work, we propose a hierarchical with a spatial pyramidal structure for weakly supervised multi-instance hash learning. It merges the advantages of local and multi-scale perception on CNN with the global field of view on Transformer. Further, it leverages the principle of multi-instance learning, allowing the proposed model to implement an instance-level hash mapping capability in a weakly supervised learning manner. The experimental results on three public datasets achieved more improved results compared to the typical methods, validating the effectiveness of the proposed method.

Vorheriger Artikel Efficient dual domain image denoising via SURE-based optimization

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

Jetzt informieren

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

Jetzt informieren

Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2016) Netvlad: Cnn architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5297–5307

Bay H, Tuytelaars T, Gool LV (2006) Surf: speeded up robust features. In: European Conference on computer vision, pp 404–417. Springer

Bello I, Zoph B, Vaswani A, Shlens J, Le QV (2019) Attention augmented convolutional networks. In: Proceedings of the IEEE/CVF International Conference on computer vision, pp 3286–3295

Cao J, Liu L, Wang P, Huang Z, Shen C, Shen HT (2016) Where to focus: query adaptive matching for instance retrieval using convolutional feature maps. arXiv preprint arXiv:1606.06811

Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on computer vision, pp 213–229. Springer

Cheng MM, Zhang Z, Lin WY, Torr P (2014) Bing: binarized normed gradients for objectness estimation at 300fps. In: IEEE Conference on computer vision and pattern recognition, pp 3286–3293

El-Nouby A, Neverova N, Laptev I, Jégou H (2021) Training vision transformers for image retrieval. arXiv preprint arXiv:2102.05644

Everingham M, Zisserman A, Williams CK, Van Gool L, Allan M, Bishop CM, Chapelle O, Dalal N, Deselaers T, Dorkó G, et al (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88, pp. 303–338.

Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on artificial intelligence and statistics, pp 249–256. JMLR Workshop and Conference Proceedings

10.

Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: European Conference on computer vision, pp 392–407. Springer

11.

He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on computer vision, pp 2961–2969

12.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 770–778

13.

Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670MathSciNetCrossRefMATH

14.

Hong C, Yu J, Zhang J, Jin X, Lee KH (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inf 15(7):3952–3961CrossRef

15.

O. Russakovsky, J. Deng, J. Krause, A. Berg, F. Li (2013) ILSVRC-2013 https://image-net.org/challenges/LSVRC/2013/

16.

Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: 2010 IEEE Computer Society Conference on computer vision and pattern recognition, pp 3304–3311. IEEE

17.

Jia Y, Gu Z, Jiang Z, Gao C, Yang J (2023) Persistent graph stream summarization for real-time graph analytics. In: World Wide Web, pp 1–21

18.

Jiménez A, Alvarez JM, Giró Nieto X (2017) Class-weighted convolutional features for visual instance search. In: Proceedings of the 28th British Machine Vision Conference 2017, pp 1–12

19.

Lee DC, Ke Q, Isard M (2010) Partition min-hash for partial duplicate image discovery. In: European Conference on computer vision, pp 648–662

20.

Li F, Liu R (2015) Multi-graph multi-instance learning with soft label consistency for object-based image retrieval. In: 2015 IEEE International Conference on Multimedia and Expo (ICME), pp 1–6. IEEE

21.

Li WJ, Wang S, Kang WC (2015) Feature learning based deep supervised hashing with pairwise labels. arXiv preprint arXiv:1511.03855

22.

Lin J, Zhan Y, Zhao WL (2021) Instance search based on weakly supervised feature learning. Neurocomputing 424:117–124CrossRef

23.

Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2117–2125

24.

Liu GH, Yang JY (2023) Exploiting deep textures for image retrieval. Int J Mach Learn Cybern 14(2):483–494CrossRef

25.

Liu H, Tian Y, Yang Y, Pang L, Huang T (2016) Deep relative distance learning: tell the difference between similar vehicles. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2167–2175

26.

Liu J, Chen Y, Huang X, Li J, Min G (2023) Gnn-based long and short term preference modeling for next-location prediction. Inf Sci 629:1–14CrossRef

27.

Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1096–1104

28.

Loshchilov I, Hutter F (2018) Decoupled weight decay regularization. In: International Conference on learning representations

29.

Lowe D (2004) Distinctive image features from scale invariant keypoints. Int J Comput Vis 60:91–110

30.

Mohedano E, McGuinness K, Giró-i Nieto X, O’Connor NE (2018) Saliency weighted convolutional features for instance search. In: 2018 international Conference on content-based multimedia indexing (CBMI), pp 1–6. IEEE

31.

Nguyen VA, Do MN (2016) Deep learning based supervised hashing for efficient image retrieval. In: IEEE International Conference on Multimedia and Expo (ICME), pp 1–6. IEEE

32.

Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, Tran D (2018) Image transformer. In: ICML

33.

Rahmani R, Goldman SA (2006) Missl: multiple-instance semi-supervised learning. In: Proceedings of the 23rd International Conference on machine learning, pp 705–712

34.

Rahmani R, Goldman SA (2016) Sgdr: Stochastic gradient descent with warm restarts. In: Proceedings of International Conference on learning representation

35.

Rahmani R, Goldman SA, Zhang H, Cholleti SR, Fritts JE (2008) Localized content-based image retrieval. IEEE Trans Pattern Anal Mach Intell 30(11):1902–1912CrossRef

36.

Rahmani R, Goldman SA, Zhang H, Krettek J, Fritts JE (2005) Localized content based image retrieval. In: Proceedings of the 7th ACM SIGMM International Workshop on Multimedia information retrieval, pp 227–236

37.

Russell BC, Freeman WT, Efros AA, Sivic J, Zisserman A (2006) Using multiple segmentations to discover objects and their extent in image collections. In: 2006 IEEE Computer Society Conference on computer vision and pattern recognition (CVPR’06). vol. 2, pp 1605–1614. IEEE

38.

Salvador A, Giró-i Nieto X, Marqués F, Satoh S (2016) Faster r-cnn features for instance search. In: Proceedings of the IEEE Conference on computer vision and pattern recognition workshops, pp 9–16

39.

Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Computer Vision, IEEE International Conference on. vol. 3, pp 1470–1470. IEEE Computer Society

40.

Smeulders AW, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380CrossRef

41.

Song X, Li J, Cai T, Yang S, Yang T, Liu C (2022) A survey on deep learning based knowledge tracing. Knowl-Based Syst 258:110036CrossRef

42.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, 30

43.

Vieux R, Benois-Pineau J, Domenger JP (2012) Content based image retrieval using bag-of-regions. In: International Conference on multimedia modeling, pp 507–517. Springer

44.

Wang X, Yan Y, Tang P, Bai X, Liu W (2018) Revisiting multiple instance neural networks. Pattern Recogn 74:15–24CrossRef

45.

Yang HF, Lin K, Chen CS (2017) Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):437–451CrossRef

46.

Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578CrossRef

47.

Zhan Y, Zhao WL (2021) Instance search via instance level segmentation and feature representation. J Vis Commun Image Represent 79:103253CrossRef

48.

Zhao F, Huang Y, Wang L, Tan T (2015) Deep semantic ranking based hashing for multi-label image retrieval. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1556–1564

49.

Zhao W, Guan Z, Luo H, Peng J, Fan J (2017) Deep multiple instance hashing for object-based image retrieval. In: IJCAI. pp. 3504–3510

50.

Zhao W, Guan Z, Luo H, Peng J, Fan J (2021) Deep multiple instance hashing for fast multi-object image search. IEEE Trans Image Process 30:7995–8007MathSciNetCrossRef

51.

Zhou K, Liu Y, Song J, Yan L, Zou F, Shen F (2015) Deep self-taught hashing for image retrieval. In: ACM International Conference on multimedia, pp 1215–1218

52.

Zhu J, Shu Y, Zhang J, Wang X, Wu S (2022) Triplet-object loss for large scale deep image retrieval. Int J Mach Learn Cybern 13(1):1–9CrossRef

Titel: Multi-scale fusion transformer based weakly supervised hashing learning for instance retrieval
verfasst von: Yuanhai Lv
Chen Jiao
Wanqing Zhao
Wei Zhao
Ziyu Guan
Xiaofei He
Publikationsdatum: 13.07.2023
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal of Machine Learning and Cybernetics / Ausgabe 12/2023
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-023-01907-5

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Internationaler Motorenkongress/© [M] ATZlive | Chisnikov / Fotolia.com, Search Icon, Banner Hanser, Benny Hahn/© ZEP GmbH, Customer Experience/© © oatawa / Getty Images / iStock, Erdgasmotor 1.5 TGI evo von Volkswagen/© Volkswagen AG, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Weitere Artikel der Ausgabe 12/2023

BCDetNet: a deep learning architecture for building change detection from bi-temporal high resolution satellite images

Face alignment combined with shape constraints and Gaussian heatmap

Self-similarity feature based few-shot learning via hierarchical relation network

Incomplete multi-view clustering via attention-based contrast learning

IST-PTEPN: an improved pedestrian trajectory and endpoint prediction network based on spatio-temporal information

Adaptive affinity matrix learning for dimensionality reduction

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.