nach oben

International Journal of Multimedia Information Retrieval

Erschienen in:

01.12.2023 | Regular Paper

FOF: a fine-grained object detection and feature extraction end-to-end network

verfasst von: Wenzhong Shen, Jinpeng Chen, Jie Shao

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Currently, widely used object detection can predict targets present in the training set. However, in fine-grained object detection tasks, such as commodity detection, the introduction of a new target class requires retraining the model, which significantly reduces the flexibility of the algorithm in applications. In response to this problem, we propose an end-to-end fine-grained object detection and feature extraction network (FOF). To detect and identify objects beyond the target category of the training set, the category output in the network head is removed and replaced with a 128-dimensional feature vector. We used the ArcFace loss function to improve feature classification during training. Since there is no category output, an improved non-maximum suppression algorithm, non-maximum suppression-feature similarity, is proposed to distinguish same class and dissimilar class prediction boxes by feature similarity. During the inference, FOF outputs prediction boxes and feature vectors, and matches them with the feature vectors in the feature gallery to determine the detected object category and complete object detection and recognition. Experimental results indicate that FOF achieved high accuracy in both the MS COCO, PASCAL VOC2012, SmartUVM, and a large-scale and fine-grained Retail Product Checkout datasets. In addition, the method exhibits a low equal error rate when identifying new categories, achieving the objective of detecting and identifying new categories without the need to retrain the model.

Vorheriger Artikel Visual feature segmentation with reinforcement learning for continuous sign language recognition

Nächster Artikel Sentiment analysis using deep learning techniques: a comprehensive review

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Wei X-S, Cui Q, Yang L, Wang P, Liu L, Yang J (2022) Rpc: a large-scale and fine-grained retail product checkout dataset

Ding Y, Zhou Y, Zhu Y, Ye Q, Jiao J (2019) Selective sparse sampling for fine-grained image recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6599–6608

Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446

Liu C, Xie H, Zha Z-J, Ma L, Lingyun Yu, Zhang Y (2020) Filtration and distillation: enhancing region attention for fine-grained visual categorization. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 11555–11562

Peng Y, He X, Zhao J (2017) Object-part attention model for fine-grained image classification. IEEE Trans Image Process 27(3):1487–1500MathSciNetCrossRefMATH

Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: Computer vision–ECCV 2014: 13th European conference, Zurich, Switzerland, September 6–12, 2014, proceedings, part I 13. Springer, pp 834–849

Zhang X, Xiong H, Zhou W, Tian Q (2015) Fused one-vs-all features with semantic alignments for fine-grained visual categorization. IEEE Trans Image Process 25(2):878–892MathSciNetCrossRefMATH

Zheng H, Fu J, Zha Z-J, Luo J (2019) Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5012–5021

Wei X-S, Xie C-W, Jianxin W, Shen C (2018) Mask-cnn: localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn 76:704–714CrossRef

10.

Lin T-Y, Roy CA, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457

11.

Zhao Y, Yan K, Huang F, Li J (2021) Graph-based high-order relation discovery for fine-grained recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15079–15088

12.

Sun X, Chen L, Yang J (2019) Learning from web data using adversarial discriminative neural networks for fine-grained classification. Proc AAAI Conf Artif Intell 33:273–280

13.

He X, Peng Y (2017) Fine-grained image classification via combining vision and language. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5994–6002

14.

Wei S, Guo R, Cui C, Lu B, Dong S, Gao T, Du Y, Zhou Y, Lyu X, Liu Q, et al (2021) Pp-shitu: a practical lightweight image recognition system. arXiv:2111.00775

15.

Wang Q, Rasmussen C (2019) Towards fine-grained recognition: joint learning for object detection and fine-grained classification. In: Advances in visual computing: 14th international symposium on visual computing, ISVC 2019, Lake Tahoe, NV, USA, October 7–9, 2019, proceedings, Part II 14. Springer, pp 332–344

16.

Lv Z, Wang W, Zhiqiang X, Zhang K, Fan Y, Song Y (2021) Fine-grained object detection method using attention mechanism and its application in coal-gangue detection. Appl Soft Comput 113:107891CrossRef

17.

Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4690–4699

18.

Wang C-Y, Bochkovskiy A, Liao H-YM (2022) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv:2207.02696

19.

Zhang H, Li D, Ji Y, Zhou H, Weiwei W, Liu K (2020) Toward new retail: a benchmark dataset for smart unmanned vending machines. IEEE Trans Industr Inf 16(12):7722–7731CrossRef

20.

Jocher G, et al (2021) ultralytics/yolov5: v6.0 - YOLOv5n ’Nano’ models, Roboflow integration, TensorFlow export, OpenCV DNN support

21.

Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021. arXiv:2107.08430

22.

Zheng Z, Wang P, Ren D, Liu W, Ye R, Qinghua H, Zuo W (2021) Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans Cybern 52(8):8574–8586CrossRef

23.

Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11)

24.

Yu G, Chang Q, Lv W, Chang X, Cui C, Ji W, Dang Q, Deng K, Wang G, Yuning D, Lai B, Liu Q, Hu X, Yu D, Ma Y (2021) A better real-time object detector on mobile devices, Pp-picodet

25.

Cui C, Gao T, Wei S, Du Y, Guo R, Dong S, Bin L, Zhou Y, Lv X, Liu Q, Hu X, Yu D, Ma Y (2021) A lightweight cpu convolutional neural network, Pp-lcnet

26.

Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2. IEEE, pp 1735–1742

27.

Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:1703.07737

28.

Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, proceedings, part VII 14. Springer, pp 499–515

29.

Liu W, Wen Y, Yu Z, Yang M (2016) Large-margin softmax loss for convolutional neural networks. arXiv:1612.02295

30.

Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) Sphereface: deep hypersphere embedding for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 212–220

31.

Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5265–5274

Titel: FOF: a fine-grained object detection and feature extraction end-to-end network
verfasst von: Wenzhong Shen
Jinpeng Chen
Jie Shao
Publikationsdatum: 01.12.2023
Verlag: Springer London
Erschienen in: International Journal of Multimedia Information Retrieval / Ausgabe 2/2023
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI: https://doi.org/10.1007/s13735-023-00306-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2023

Neural style transfer generative adversarial network (NST-GAN) for facial expression recognition

SPSD: Similarity-preserving self-distillation for video–text retrieval

Modal interaction-enhanced prompt learning by transformer decoder for vision-language models

PSNet: position-shift alignment network for image caption

Style-aware adversarial pairwise ranking for image recommendation systems

Detecting abnormal behavior in megastore for crime prevention using a deep neural architecture

Premium Partner