Top

International Journal of Multimedia Information Retrieval

Published in:

01-12-2023 | Regular Paper

FOF: a fine-grained object detection and feature extraction end-to-end network

Authors: Wenzhong Shen, Jinpeng Chen, Jie Shao

Published in: International Journal of Multimedia Information Retrieval | Issue 2/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Currently, widely used object detection can predict targets present in the training set. However, in fine-grained object detection tasks, such as commodity detection, the introduction of a new target class requires retraining the model, which significantly reduces the flexibility of the algorithm in applications. In response to this problem, we propose an end-to-end fine-grained object detection and feature extraction network (FOF). To detect and identify objects beyond the target category of the training set, the category output in the network head is removed and replaced with a 128-dimensional feature vector. We used the ArcFace loss function to improve feature classification during training. Since there is no category output, an improved non-maximum suppression algorithm, non-maximum suppression-feature similarity, is proposed to distinguish same class and dissimilar class prediction boxes by feature similarity. During the inference, FOF outputs prediction boxes and feature vectors, and matches them with the feature vectors in the feature gallery to determine the detected object category and complete object detection and recognition. Experimental results indicate that FOF achieved high accuracy in both the MS COCO, PASCAL VOC2012, SmartUVM, and a large-scale and fine-grained Retail Product Checkout datasets. In addition, the method exhibits a low equal error rate when identifying new categories, achieving the objective of detecting and identifying new categories without the need to retrain the model.

previous article Visual feature segmentation with reinforcement learning for continuous sign language recognition

next article Sentiment analysis using deep learning techniques: a comprehensive review

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Wei X-S, Cui Q, Yang L, Wang P, Liu L, Yang J (2022) Rpc: a large-scale and fine-grained retail product checkout dataset

Ding Y, Zhou Y, Zhu Y, Ye Q, Jiao J (2019) Selective sparse sampling for fine-grained image recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6599–6608

Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446

Liu C, Xie H, Zha Z-J, Ma L, Lingyun Yu, Zhang Y (2020) Filtration and distillation: enhancing region attention for fine-grained visual categorization. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 11555–11562

Peng Y, He X, Zhao J (2017) Object-part attention model for fine-grained image classification. IEEE Trans Image Process 27(3):1487–1500MathSciNetCrossRefMATH

Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: Computer vision–ECCV 2014: 13th European conference, Zurich, Switzerland, September 6–12, 2014, proceedings, part I 13. Springer, pp 834–849

Zhang X, Xiong H, Zhou W, Tian Q (2015) Fused one-vs-all features with semantic alignments for fine-grained visual categorization. IEEE Trans Image Process 25(2):878–892MathSciNetCrossRefMATH

Zheng H, Fu J, Zha Z-J, Luo J (2019) Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5012–5021

Wei X-S, Xie C-W, Jianxin W, Shen C (2018) Mask-cnn: localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn 76:704–714CrossRef

10.

Lin T-Y, Roy CA, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457

11.

Zhao Y, Yan K, Huang F, Li J (2021) Graph-based high-order relation discovery for fine-grained recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15079–15088

12.

Sun X, Chen L, Yang J (2019) Learning from web data using adversarial discriminative neural networks for fine-grained classification. Proc AAAI Conf Artif Intell 33:273–280

13.

He X, Peng Y (2017) Fine-grained image classification via combining vision and language. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5994–6002

14.

Wei S, Guo R, Cui C, Lu B, Dong S, Gao T, Du Y, Zhou Y, Lyu X, Liu Q, et al (2021) Pp-shitu: a practical lightweight image recognition system. arXiv:2111.00775

15.

Wang Q, Rasmussen C (2019) Towards fine-grained recognition: joint learning for object detection and fine-grained classification. In: Advances in visual computing: 14th international symposium on visual computing, ISVC 2019, Lake Tahoe, NV, USA, October 7–9, 2019, proceedings, Part II 14. Springer, pp 332–344

16.

Lv Z, Wang W, Zhiqiang X, Zhang K, Fan Y, Song Y (2021) Fine-grained object detection method using attention mechanism and its application in coal-gangue detection. Appl Soft Comput 113:107891CrossRef

17.

Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4690–4699

18.

Wang C-Y, Bochkovskiy A, Liao H-YM (2022) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv:2207.02696

19.

Zhang H, Li D, Ji Y, Zhou H, Weiwei W, Liu K (2020) Toward new retail: a benchmark dataset for smart unmanned vending machines. IEEE Trans Industr Inf 16(12):7722–7731CrossRef

20.

Jocher G, et al (2021) ultralytics/yolov5: v6.0 - YOLOv5n ’Nano’ models, Roboflow integration, TensorFlow export, OpenCV DNN support

21.

Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021. arXiv:2107.08430

22.

Zheng Z, Wang P, Ren D, Liu W, Ye R, Qinghua H, Zuo W (2021) Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans Cybern 52(8):8574–8586CrossRef

23.

Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11)

24.

Yu G, Chang Q, Lv W, Chang X, Cui C, Ji W, Dang Q, Deng K, Wang G, Yuning D, Lai B, Liu Q, Hu X, Yu D, Ma Y (2021) A better real-time object detector on mobile devices, Pp-picodet

25.

Cui C, Gao T, Wei S, Du Y, Guo R, Dong S, Bin L, Zhou Y, Lv X, Liu Q, Hu X, Yu D, Ma Y (2021) A lightweight cpu convolutional neural network, Pp-lcnet

26.

Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2. IEEE, pp 1735–1742

27.

Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:1703.07737

28.

Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, proceedings, part VII 14. Springer, pp 499–515

29.

Liu W, Wen Y, Yu Z, Yang M (2016) Large-margin softmax loss for convolutional neural networks. arXiv:1612.02295

30.

Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) Sphereface: deep hypersphere embedding for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 212–220

31.

Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5265–5274

Title: FOF: a fine-grained object detection and feature extraction end-to-end network
Authors: Wenzhong Shen
Jinpeng Chen
Jie Shao
Publication date: 01-12-2023
Publisher: Springer London
Published in: International Journal of Multimedia Information Retrieval / Issue 2/2023
Print ISSN: 2192-6611
Electronic ISSN: 2192-662X
DOI: https://doi.org/10.1007/s13735-023-00306-4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2023

Decision fusion for few-shot image classification

A lightweight small object detection algorithm based on improved YOLOv5 for driving scenarios

Recognition of student engagement in classroom from affective states

Style-aware adversarial pairwise ranking for image recommendation systems

Medical image watermarking: a survey on applications, approach and performance requirement compliance

Sentiment analysis using deep learning techniques: a comprehensive review

Premium Partner