Skip to main content
Erschienen in: International Journal of Multimedia Information Retrieval 2/2023

01.12.2023 | Regular Paper

FOF: a fine-grained object detection and feature extraction end-to-end network

verfasst von: Wenzhong Shen, Jinpeng Chen, Jie Shao

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Currently, widely used object detection can predict targets present in the training set. However, in fine-grained object detection tasks, such as commodity detection, the introduction of a new target class requires retraining the model, which significantly reduces the flexibility of the algorithm in applications. In response to this problem, we propose an end-to-end fine-grained object detection and feature extraction network (FOF). To detect and identify objects beyond the target category of the training set, the category output in the network head is removed and replaced with a 128-dimensional feature vector. We used the ArcFace loss function to improve feature classification during training. Since there is no category output, an improved non-maximum suppression algorithm, non-maximum suppression-feature similarity, is proposed to distinguish same class and dissimilar class prediction boxes by feature similarity. During the inference, FOF outputs prediction boxes and feature vectors, and matches them with the feature vectors in the feature gallery to determine the detected object category and complete object detection and recognition. Experimental results indicate that FOF achieved high accuracy in both the MS COCO, PASCAL VOC2012, SmartUVM, and a large-scale and fine-grained Retail Product Checkout datasets. In addition, the method exhibits a low equal error rate when identifying new categories, achieving the objective of detecting and identifying new categories without the need to retrain the model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Wei X-S, Cui Q, Yang L, Wang P, Liu L, Yang J (2022) Rpc: a large-scale and fine-grained retail product checkout dataset Wei X-S, Cui Q, Yang L, Wang P, Liu L, Yang J (2022) Rpc: a large-scale and fine-grained retail product checkout dataset
2.
Zurück zum Zitat Ding Y, Zhou Y, Zhu Y, Ye Q, Jiao J (2019) Selective sparse sampling for fine-grained image recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6599–6608 Ding Y, Zhou Y, Zhu Y, Ye Q, Jiao J (2019) Selective sparse sampling for fine-grained image recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6599–6608
3.
Zurück zum Zitat Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446 Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446
4.
Zurück zum Zitat Liu C, Xie H, Zha Z-J, Ma L, Lingyun Yu, Zhang Y (2020) Filtration and distillation: enhancing region attention for fine-grained visual categorization. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 11555–11562 Liu C, Xie H, Zha Z-J, Ma L, Lingyun Yu, Zhang Y (2020) Filtration and distillation: enhancing region attention for fine-grained visual categorization. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 11555–11562
5.
Zurück zum Zitat Peng Y, He X, Zhao J (2017) Object-part attention model for fine-grained image classification. IEEE Trans Image Process 27(3):1487–1500MathSciNetCrossRefMATH Peng Y, He X, Zhao J (2017) Object-part attention model for fine-grained image classification. IEEE Trans Image Process 27(3):1487–1500MathSciNetCrossRefMATH
6.
Zurück zum Zitat Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: Computer vision–ECCV 2014: 13th European conference, Zurich, Switzerland, September 6–12, 2014, proceedings, part I 13. Springer, pp 834–849 Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: Computer vision–ECCV 2014: 13th European conference, Zurich, Switzerland, September 6–12, 2014, proceedings, part I 13. Springer, pp 834–849
7.
Zurück zum Zitat Zhang X, Xiong H, Zhou W, Tian Q (2015) Fused one-vs-all features with semantic alignments for fine-grained visual categorization. IEEE Trans Image Process 25(2):878–892MathSciNetCrossRefMATH Zhang X, Xiong H, Zhou W, Tian Q (2015) Fused one-vs-all features with semantic alignments for fine-grained visual categorization. IEEE Trans Image Process 25(2):878–892MathSciNetCrossRefMATH
8.
Zurück zum Zitat Zheng H, Fu J, Zha Z-J, Luo J (2019) Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5012–5021 Zheng H, Fu J, Zha Z-J, Luo J (2019) Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5012–5021
9.
Zurück zum Zitat Wei X-S, Xie C-W, Jianxin W, Shen C (2018) Mask-cnn: localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn 76:704–714CrossRef Wei X-S, Xie C-W, Jianxin W, Shen C (2018) Mask-cnn: localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn 76:704–714CrossRef
10.
Zurück zum Zitat Lin T-Y, Roy CA, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457 Lin T-Y, Roy CA, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457
11.
Zurück zum Zitat Zhao Y, Yan K, Huang F, Li J (2021) Graph-based high-order relation discovery for fine-grained recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15079–15088 Zhao Y, Yan K, Huang F, Li J (2021) Graph-based high-order relation discovery for fine-grained recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15079–15088
12.
Zurück zum Zitat Sun X, Chen L, Yang J (2019) Learning from web data using adversarial discriminative neural networks for fine-grained classification. Proc AAAI Conf Artif Intell 33:273–280 Sun X, Chen L, Yang J (2019) Learning from web data using adversarial discriminative neural networks for fine-grained classification. Proc AAAI Conf Artif Intell 33:273–280
13.
Zurück zum Zitat He X, Peng Y (2017) Fine-grained image classification via combining vision and language. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5994–6002 He X, Peng Y (2017) Fine-grained image classification via combining vision and language. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5994–6002
14.
Zurück zum Zitat Wei S, Guo R, Cui C, Lu B, Dong S, Gao T, Du Y, Zhou Y, Lyu X, Liu Q, et al (2021) Pp-shitu: a practical lightweight image recognition system. arXiv:2111.00775 Wei S, Guo R, Cui C, Lu B, Dong S, Gao T, Du Y, Zhou Y, Lyu X, Liu Q, et al (2021) Pp-shitu: a practical lightweight image recognition system. arXiv:​2111.​00775
15.
Zurück zum Zitat Wang Q, Rasmussen C (2019) Towards fine-grained recognition: joint learning for object detection and fine-grained classification. In: Advances in visual computing: 14th international symposium on visual computing, ISVC 2019, Lake Tahoe, NV, USA, October 7–9, 2019, proceedings, Part II 14. Springer, pp 332–344 Wang Q, Rasmussen C (2019) Towards fine-grained recognition: joint learning for object detection and fine-grained classification. In: Advances in visual computing: 14th international symposium on visual computing, ISVC 2019, Lake Tahoe, NV, USA, October 7–9, 2019, proceedings, Part II 14. Springer, pp 332–344
16.
Zurück zum Zitat Lv Z, Wang W, Zhiqiang X, Zhang K, Fan Y, Song Y (2021) Fine-grained object detection method using attention mechanism and its application in coal-gangue detection. Appl Soft Comput 113:107891CrossRef Lv Z, Wang W, Zhiqiang X, Zhang K, Fan Y, Song Y (2021) Fine-grained object detection method using attention mechanism and its application in coal-gangue detection. Appl Soft Comput 113:107891CrossRef
17.
Zurück zum Zitat Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4690–4699 Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4690–4699
18.
Zurück zum Zitat Wang C-Y, Bochkovskiy A, Liao H-YM (2022) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv:2207.02696 Wang C-Y, Bochkovskiy A, Liao H-YM (2022) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv:​2207.​02696
19.
Zurück zum Zitat Zhang H, Li D, Ji Y, Zhou H, Weiwei W, Liu K (2020) Toward new retail: a benchmark dataset for smart unmanned vending machines. IEEE Trans Industr Inf 16(12):7722–7731CrossRef Zhang H, Li D, Ji Y, Zhou H, Weiwei W, Liu K (2020) Toward new retail: a benchmark dataset for smart unmanned vending machines. IEEE Trans Industr Inf 16(12):7722–7731CrossRef
20.
Zurück zum Zitat Jocher G, et al (2021) ultralytics/yolov5: v6.0 - YOLOv5n ’Nano’ models, Roboflow integration, TensorFlow export, OpenCV DNN support Jocher G, et al (2021) ultralytics/yolov5: v6.0 - YOLOv5n ’Nano’ models, Roboflow integration, TensorFlow export, OpenCV DNN support
22.
Zurück zum Zitat Zheng Z, Wang P, Ren D, Liu W, Ye R, Qinghua H, Zuo W (2021) Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans Cybern 52(8):8574–8586CrossRef Zheng Z, Wang P, Ren D, Liu W, Ye R, Qinghua H, Zuo W (2021) Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans Cybern 52(8):8574–8586CrossRef
23.
Zurück zum Zitat Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11) Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11)
24.
Zurück zum Zitat Yu G, Chang Q, Lv W, Chang X, Cui C, Ji W, Dang Q, Deng K, Wang G, Yuning D, Lai B, Liu Q, Hu X, Yu D, Ma Y (2021) A better real-time object detector on mobile devices, Pp-picodet Yu G, Chang Q, Lv W, Chang X, Cui C, Ji W, Dang Q, Deng K, Wang G, Yuning D, Lai B, Liu Q, Hu X, Yu D, Ma Y (2021) A better real-time object detector on mobile devices, Pp-picodet
25.
Zurück zum Zitat Cui C, Gao T, Wei S, Du Y, Guo R, Dong S, Bin L, Zhou Y, Lv X, Liu Q, Hu X, Yu D, Ma Y (2021) A lightweight cpu convolutional neural network, Pp-lcnet Cui C, Gao T, Wei S, Du Y, Guo R, Dong S, Bin L, Zhou Y, Lv X, Liu Q, Hu X, Yu D, Ma Y (2021) A lightweight cpu convolutional neural network, Pp-lcnet
26.
Zurück zum Zitat Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2. IEEE, pp 1735–1742 Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2. IEEE, pp 1735–1742
28.
Zurück zum Zitat Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, proceedings, part VII 14. Springer, pp 499–515 Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, proceedings, part VII 14. Springer, pp 499–515
30.
Zurück zum Zitat Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) Sphereface: deep hypersphere embedding for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 212–220 Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) Sphereface: deep hypersphere embedding for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 212–220
31.
Zurück zum Zitat Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5265–5274 Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5265–5274
Metadaten
Titel
FOF: a fine-grained object detection and feature extraction end-to-end network
verfasst von
Wenzhong Shen
Jinpeng Chen
Jie Shao
Publikationsdatum
01.12.2023
Verlag
Springer London
Erschienen in
International Journal of Multimedia Information Retrieval / Ausgabe 2/2023
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-023-00306-4

Weitere Artikel der Ausgabe 2/2023

International Journal of Multimedia Information Retrieval 2/2023 Zur Ausgabe

Premium Partner