nach oben

Neural Processing Letters

Erschienen in:

20.03.2021

Detection-Oriented Backbone Trained from Near Scratch and Local Feature Refinement for Small Object Detection

verfasst von: Zhiwei Yan, Huicheng Zheng, Ye Li, Lvran Chen

Erschienen in: Neural Processing Letters | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Current detection networks usually struggle to detect small-scale object instances due to spatial information loss and lack of semantics. In this paper, we propose a one-stage detector named LocalNet, which pays specific attention to the detailed information modeling. LocalNet is built upon our redesigned detection-oriented backbone called long neck ResNet, which aims to preserve more detailed information in the early stage to enhance the representation of small objects. Furthermore, to enhance the semantics in the detection layers, we propose a local detail-context module, which reintroduces the detailed information lost in the network and exploits the local context within a restricted receptive field range. Moreover, we explore a method for training detectors nearly or totally from scratch, which provides the potential to design network structures with more freedom. With nearly \(94\%\) of the pretrained parameters randomly reinitialized in the backbone, our model improves the mAP of our baseline model from 75.0 to \(82.3\%\) on the PASCAL VOC dataset with an input size of \(300\times 300\) and achieves state-of-the-art accuracy. Even when trained from scratch, our model achieves \(80.8\%\) mAP, which is \(5.8\%\) greater than the mAP of our baseline model with a fully pretrained backbone.

Vorheriger Artikel Enhanced Non-parametric Sequence-based Learning Algorithm for Outlier Detection in the Internet of Things

Nächster Artikel Pre-Training Acquisition Functions by Deep Reinforcement Learning for Fixed Budget Active Learning

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Bell S, Lawrence ZC, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2874–2883

Bjorck J, Gomes CP, Selman B, Weinberger KQ (2018) Understanding batch normalization. In: Advances in neural information processing systems. pp 7705–7716

Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6154–6162

Chen C, Ling Q (2019) Adaptive convolution for object detection. IEEE Trans Multimedia 21(12):3205–3217CrossRef

Chi C, Zhang S, Xing J, Lei Z, Li SZ, Zou X (2019) Selective refinement network for high performance face detection. In: Proceedings of the AAAI conference on artificial intelligence. pp 231–238

Chu J, Guo Z, Leng L (2018) Object detection based on multi-layer convolution feature fusion and online hard example mining. IEEE Access 6:19959–19967CrossRef

Chu W, Cai D (2018) Deep feature based contextual model for object detection. Neurocomputing 275:1035–1042CrossRef

Dai J, Li Y, He K, Sun J (2016) R-FCN: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems. pp 379–387

Deng J, Dong W, Socher R, Li L, Fei LF (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 248–255

10.

Ding H, Jiang X, Shuai B, Liu AQ, Wang G (2018) Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2393–2402

11.

Dong Z, Li G, Liao Y, Wang F, Ren P, Qian C (2020) Centripetalnet: pursuing high-quality keypoint pairs for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 10519–10528

12.

Dvornik N, Shmelkov K, Mairal J, Schmid C (2017) Blitznet: a real-time deep network for scene understanding. In: Proceedings of the IEEE international conference on computer vision. pp 4154–4162

13.

Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338CrossRef

14.

Fu C, Liu W, Ranga A, Tyagi A, Berg A (2017) DSSD: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659

15.

Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision. pp 1440–1448

16.

Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 580–587

17.

Gong T, Liu B, Chu Q, Yu N (2019) Using multi-label classification to improve object detection. Neurocomputing 370:174–185CrossRef

18.

Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE and pattern recognition. pp 12595–12604

19.

Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 447–456

20.

He K, Girshick R, Dollár P (2019) Rethinking imagenet pre-training. In: Proceedings of the IEEE international conference on computer vision. pp 4918–4927

21.

He K, Gkioxari G, Dollár PRG (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988

22.

He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision. pp 1026–1034

23.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778

24.

Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors. In: Proceedings of the European conference on computer vision. Springer, pp 340–353

25.

Hong C, Yu J, Zhang J, Jin X, Lee KH (2019) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inform 15(7):3952–3961CrossRef

26.

Huang G, Liu Z, Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2261–2269

27.

Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. pp 448–456

28.

Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) RON: Reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5244–5252

29.

Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision. pp 1–17

30.

Li J, Liang X, Li J, Wei Y, Xu T, Feng J, Yan S (2018) Multistage object detection with group recursive learning. IEEE Trans Multimedia 20(7):1645–1655CrossRef

31.

Li S, Yang L, Huang J, Hua X, Zhang L (2019) Dynamic anchor feature selection for single-shot object detection. In: Proceedings of the IEEE international conference on computer vision. pp 6609–6618

32.

Li Y, Zheng H, Yan Z, Chen L (2019) Detail preservation and feature refinement for object detection. Neurocomputing 359:209–218CrossRef

33.

Lin T, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988

34.

Lin T, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C (2014) Microsoft COCO: Common objects in context. In: Proceedings of the European conference on computer vision. pp 740–755

35.

Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 936–944

36.

Liu S, Huang D, Wang Y (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision. pp 1–16

37.

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg AC (2016) SSD: Single shot multibox detector. In: Proceedings of the European conference on computer vision. pp 21–37

38.

Loshchilov I, Hutter F (2017) SGDR: stochastic gradient descent with warm restarts. In: International conference on learning representations

39.

Pang Y, Wang T, Anwer RM, Khan FS, Shao L (2019) Efficient featurized image pyramid network for single shot detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition

40.

Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 779–788

41.

Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7263–7271

42.

Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. pp 1–6. arXiv preprint arXiv:1804.02767

43.

Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef

44.

Shen Z, Liu Z, Li J, Jiang Y, Chen Y, Xue X (2017) Dsod: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision. pp. 1919–1927

45.

Shen Z, Liu Z, Li J, Jiang Y, Chen Y, Xue X (2019) Object detection from scratch with deep supervision. IEEE Trans Pattern Anal Mach Intell 42:398–412CrossRef

46.

Shen Z, Shi H, Feris R, Cao L, Yan S, Liu D, Wang X, Xue X, Huang TS (2017) Learning object detectors from scratch with gated recurrent feature pyramids. arXiv preprint arXiv:1712.00886

47.

Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 761–769

48.

Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the international conference on learning representations. pp 1–14

49.

Sun F, Kong T, Huang W, Tan C, Fang B, Liu H (2019) Feature pyramid reconfiguration with consistent loss for object detection. IEEE Trans Image Process 28(10):5041–5051MathSciNetCrossRef

50.

Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI conference on artificial intelligence. pp 4278–4284

51.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1–9

52.

Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2818–2826

53.

Wang G, Xiong Z, Liu D, Luo C (2018) Cascade mask generation framework for fast small object detection. In: Proceedings of the IEEE international conference on multimedia and expo. pp 1–6

54.

Wang N, Gao Y, Chen H, Wang P, Tian Z, Shen C, Zhang Y (2020) Nas-fcos: Fast neural architecture search for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 11943–11951

55.

Woo S, Hwang S, Kweon IS (2018) StairNet: Top-down semantic aggregation for accurate one shot detection. In: Proceedings of the IEEE winter conference on applications of computer vision. pp 1093–1102

56.

Wu Y, He K (2018) Group normalization. In: Proceedings of the European conference on computer vision. pp 3–19

57.

Yang D, Zou Y, Zhang J, Li G (2019) C-rpns: promoting object detection in real world via a cascade structure of region proposal networks. Neurocomputing 367:20–30CrossRef

58.

Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: point set representation for object detection. In: Proceedings of the IEEE international conference on computer vision. pp 9657–9666

59.

Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. In: IEEE transactions on pattern analysis and machine intelligence pp 1–14

60.

Yu J, Tao D, Wang M (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process 21(7):3262–3272MathSciNetCrossRef

61.

Yu J, Zhu C, Zhang J, Huang Q, Tao D (2020) Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst 31(2):661–674CrossRef

62.

Zhang H, Wang K, Tian Y, Gou C, Wang F (2018) MFR-CNN: Incorporating multi-scale features and global information for traffic object detection. IIEEE Trans Veh Technol 67(9):8019–8030CrossRef

63.

Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4203–4212

64.

Zhang T, Hao L, Guo G (2019) A feature enriching object detection framework with weak segmentation loss. Neurocomputing 335:72–80CrossRef

65.

Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition

66.

Zhao H, Zhiwei L, Lufa F, Tianqi Z (2020) A balanced feature fusion SSD for object detection. Neural Process Lett 51:2789–2806CrossRef

67.

Zheng H, Chen J, Chen L, Yan Z (2020) Feature enhancement for multi-scale object detection. Neural Process Lett 51:1907–1919CrossRef

68.

Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 528–537

69.

Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 850–859

70.

Zhu R, Zhang S, Wang X, Wen L, Shi H, Bo L, Mei T (2019) Scratchdet: training single-shot object detectors from scratch. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2268–2277

Titel: Detection-Oriented Backbone Trained from Near Scratch and Local Feature Refinement for Small Object Detection
verfasst von: Zhiwei Yan
Huicheng Zheng
Ye Li
Lvran Chen
Publikationsdatum: 20.03.2021
Verlag: Springer US
Erschienen in: Neural Processing Letters / Ausgabe 3/2021
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-021-10493-y

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Buchstaben, die aus einem Megaphon kommen/© MicroStockHub/Getty Images/iStock, Digitale Lieferkette/© zapp2photo / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2021

Multi-residual Connection Network for Edge Detection

Feature Extraction via Sparse Fuzzy Difference Embedding (SFDE) for Robust Subspace Learning

Editorial Expression of Concern: No-Reference Video Quality Assessment Based on the Temporal Pooling of Deep Features

Multi-Objective Memetic Algorithms with Tree-Based Genetic Programming and Local Search for Symbolic Regression

Multi-object Spatial–Temporal Anomaly Detection Using an LSTM-Based Framework

Attention-Based Deep Gated Fully Convolutional End-to-End Architectures for Time Series Classification

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.