Skip to main content
Top
Published in: International Journal of Multimedia Information Retrieval 2/2023

01-12-2023 | Regular Paper

FOF: a fine-grained object detection and feature extraction end-to-end network

Authors: Wenzhong Shen, Jinpeng Chen, Jie Shao

Published in: International Journal of Multimedia Information Retrieval | Issue 2/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Currently, widely used object detection can predict targets present in the training set. However, in fine-grained object detection tasks, such as commodity detection, the introduction of a new target class requires retraining the model, which significantly reduces the flexibility of the algorithm in applications. In response to this problem, we propose an end-to-end fine-grained object detection and feature extraction network (FOF). To detect and identify objects beyond the target category of the training set, the category output in the network head is removed and replaced with a 128-dimensional feature vector. We used the ArcFace loss function to improve feature classification during training. Since there is no category output, an improved non-maximum suppression algorithm, non-maximum suppression-feature similarity, is proposed to distinguish same class and dissimilar class prediction boxes by feature similarity. During the inference, FOF outputs prediction boxes and feature vectors, and matches them with the feature vectors in the feature gallery to determine the detected object category and complete object detection and recognition. Experimental results indicate that FOF achieved high accuracy in both the MS COCO, PASCAL VOC2012, SmartUVM, and a large-scale and fine-grained Retail Product Checkout datasets. In addition, the method exhibits a low equal error rate when identifying new categories, achieving the objective of detecting and identifying new categories without the need to retrain the model.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Wei X-S, Cui Q, Yang L, Wang P, Liu L, Yang J (2022) Rpc: a large-scale and fine-grained retail product checkout dataset Wei X-S, Cui Q, Yang L, Wang P, Liu L, Yang J (2022) Rpc: a large-scale and fine-grained retail product checkout dataset
2.
go back to reference Ding Y, Zhou Y, Zhu Y, Ye Q, Jiao J (2019) Selective sparse sampling for fine-grained image recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6599–6608 Ding Y, Zhou Y, Zhu Y, Ye Q, Jiao J (2019) Selective sparse sampling for fine-grained image recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6599–6608
3.
go back to reference Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446 Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446
4.
go back to reference Liu C, Xie H, Zha Z-J, Ma L, Lingyun Yu, Zhang Y (2020) Filtration and distillation: enhancing region attention for fine-grained visual categorization. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 11555–11562 Liu C, Xie H, Zha Z-J, Ma L, Lingyun Yu, Zhang Y (2020) Filtration and distillation: enhancing region attention for fine-grained visual categorization. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 11555–11562
5.
go back to reference Peng Y, He X, Zhao J (2017) Object-part attention model for fine-grained image classification. IEEE Trans Image Process 27(3):1487–1500MathSciNetCrossRefMATH Peng Y, He X, Zhao J (2017) Object-part attention model for fine-grained image classification. IEEE Trans Image Process 27(3):1487–1500MathSciNetCrossRefMATH
6.
go back to reference Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: Computer vision–ECCV 2014: 13th European conference, Zurich, Switzerland, September 6–12, 2014, proceedings, part I 13. Springer, pp 834–849 Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: Computer vision–ECCV 2014: 13th European conference, Zurich, Switzerland, September 6–12, 2014, proceedings, part I 13. Springer, pp 834–849
7.
go back to reference Zhang X, Xiong H, Zhou W, Tian Q (2015) Fused one-vs-all features with semantic alignments for fine-grained visual categorization. IEEE Trans Image Process 25(2):878–892MathSciNetCrossRefMATH Zhang X, Xiong H, Zhou W, Tian Q (2015) Fused one-vs-all features with semantic alignments for fine-grained visual categorization. IEEE Trans Image Process 25(2):878–892MathSciNetCrossRefMATH
8.
go back to reference Zheng H, Fu J, Zha Z-J, Luo J (2019) Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5012–5021 Zheng H, Fu J, Zha Z-J, Luo J (2019) Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5012–5021
9.
go back to reference Wei X-S, Xie C-W, Jianxin W, Shen C (2018) Mask-cnn: localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn 76:704–714CrossRef Wei X-S, Xie C-W, Jianxin W, Shen C (2018) Mask-cnn: localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn 76:704–714CrossRef
10.
go back to reference Lin T-Y, Roy CA, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457 Lin T-Y, Roy CA, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457
11.
go back to reference Zhao Y, Yan K, Huang F, Li J (2021) Graph-based high-order relation discovery for fine-grained recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15079–15088 Zhao Y, Yan K, Huang F, Li J (2021) Graph-based high-order relation discovery for fine-grained recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15079–15088
12.
go back to reference Sun X, Chen L, Yang J (2019) Learning from web data using adversarial discriminative neural networks for fine-grained classification. Proc AAAI Conf Artif Intell 33:273–280 Sun X, Chen L, Yang J (2019) Learning from web data using adversarial discriminative neural networks for fine-grained classification. Proc AAAI Conf Artif Intell 33:273–280
13.
go back to reference He X, Peng Y (2017) Fine-grained image classification via combining vision and language. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5994–6002 He X, Peng Y (2017) Fine-grained image classification via combining vision and language. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5994–6002
14.
go back to reference Wei S, Guo R, Cui C, Lu B, Dong S, Gao T, Du Y, Zhou Y, Lyu X, Liu Q, et al (2021) Pp-shitu: a practical lightweight image recognition system. arXiv:2111.00775 Wei S, Guo R, Cui C, Lu B, Dong S, Gao T, Du Y, Zhou Y, Lyu X, Liu Q, et al (2021) Pp-shitu: a practical lightweight image recognition system. arXiv:​2111.​00775
15.
go back to reference Wang Q, Rasmussen C (2019) Towards fine-grained recognition: joint learning for object detection and fine-grained classification. In: Advances in visual computing: 14th international symposium on visual computing, ISVC 2019, Lake Tahoe, NV, USA, October 7–9, 2019, proceedings, Part II 14. Springer, pp 332–344 Wang Q, Rasmussen C (2019) Towards fine-grained recognition: joint learning for object detection and fine-grained classification. In: Advances in visual computing: 14th international symposium on visual computing, ISVC 2019, Lake Tahoe, NV, USA, October 7–9, 2019, proceedings, Part II 14. Springer, pp 332–344
16.
go back to reference Lv Z, Wang W, Zhiqiang X, Zhang K, Fan Y, Song Y (2021) Fine-grained object detection method using attention mechanism and its application in coal-gangue detection. Appl Soft Comput 113:107891CrossRef Lv Z, Wang W, Zhiqiang X, Zhang K, Fan Y, Song Y (2021) Fine-grained object detection method using attention mechanism and its application in coal-gangue detection. Appl Soft Comput 113:107891CrossRef
17.
go back to reference Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4690–4699 Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4690–4699
18.
go back to reference Wang C-Y, Bochkovskiy A, Liao H-YM (2022) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv:2207.02696 Wang C-Y, Bochkovskiy A, Liao H-YM (2022) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv:​2207.​02696
19.
go back to reference Zhang H, Li D, Ji Y, Zhou H, Weiwei W, Liu K (2020) Toward new retail: a benchmark dataset for smart unmanned vending machines. IEEE Trans Industr Inf 16(12):7722–7731CrossRef Zhang H, Li D, Ji Y, Zhou H, Weiwei W, Liu K (2020) Toward new retail: a benchmark dataset for smart unmanned vending machines. IEEE Trans Industr Inf 16(12):7722–7731CrossRef
20.
go back to reference Jocher G, et al (2021) ultralytics/yolov5: v6.0 - YOLOv5n ’Nano’ models, Roboflow integration, TensorFlow export, OpenCV DNN support Jocher G, et al (2021) ultralytics/yolov5: v6.0 - YOLOv5n ’Nano’ models, Roboflow integration, TensorFlow export, OpenCV DNN support
22.
go back to reference Zheng Z, Wang P, Ren D, Liu W, Ye R, Qinghua H, Zuo W (2021) Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans Cybern 52(8):8574–8586CrossRef Zheng Z, Wang P, Ren D, Liu W, Ye R, Qinghua H, Zuo W (2021) Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans Cybern 52(8):8574–8586CrossRef
23.
go back to reference Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11) Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11)
24.
go back to reference Yu G, Chang Q, Lv W, Chang X, Cui C, Ji W, Dang Q, Deng K, Wang G, Yuning D, Lai B, Liu Q, Hu X, Yu D, Ma Y (2021) A better real-time object detector on mobile devices, Pp-picodet Yu G, Chang Q, Lv W, Chang X, Cui C, Ji W, Dang Q, Deng K, Wang G, Yuning D, Lai B, Liu Q, Hu X, Yu D, Ma Y (2021) A better real-time object detector on mobile devices, Pp-picodet
25.
go back to reference Cui C, Gao T, Wei S, Du Y, Guo R, Dong S, Bin L, Zhou Y, Lv X, Liu Q, Hu X, Yu D, Ma Y (2021) A lightweight cpu convolutional neural network, Pp-lcnet Cui C, Gao T, Wei S, Du Y, Guo R, Dong S, Bin L, Zhou Y, Lv X, Liu Q, Hu X, Yu D, Ma Y (2021) A lightweight cpu convolutional neural network, Pp-lcnet
26.
go back to reference Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2. IEEE, pp 1735–1742 Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2. IEEE, pp 1735–1742
28.
go back to reference Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, proceedings, part VII 14. Springer, pp 499–515 Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, proceedings, part VII 14. Springer, pp 499–515
30.
go back to reference Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) Sphereface: deep hypersphere embedding for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 212–220 Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) Sphereface: deep hypersphere embedding for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 212–220
31.
go back to reference Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5265–5274 Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5265–5274
Metadata
Title
FOF: a fine-grained object detection and feature extraction end-to-end network
Authors
Wenzhong Shen
Jinpeng Chen
Jie Shao
Publication date
01-12-2023
Publisher
Springer London
Published in
International Journal of Multimedia Information Retrieval / Issue 2/2023
Print ISSN: 2192-6611
Electronic ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-023-00306-4

Other articles of this Issue 2/2023

International Journal of Multimedia Information Retrieval 2/2023 Go to the issue

Premium Partner