Skip to main content
Top
Published in: Neural Computing and Applications 12/2024

15-02-2024 | Review

A review of small object detection based on deep learning

Authors: Wei Wei, Yu Cheng, Jiafeng He, Xiyue Zhu

Published in: Neural Computing and Applications | Issue 12/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Small object detection is widely used in a variety of fields such as automatic driving, UAV-based object detection, and aerial image detection. However, small objects carry limited information, making it difficult for detectors to detect small objects. In recent years, the development of deep learning has significantly improved the performance of small object detection. This paper provides a comprehensive review to help further the development of small target detection. We summarize the challenges related to small object detection and analyze solutions to these challenges in existing works, including integrating the feature at different layers, enriching available information, balancing the number of positive and negative samples for small objects, and increasing sufficient small object instances. We discuss related methods developed in three application areas, including automatic driving, UAV search and rescue, and aerial image detection. In addition, we thoroughly analyze the performance of typical small object detection methods on popular datasets. Finally, based on the comprehensive review of small object detection methods, we point out possible research directions for future studies.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587 Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
2.
go back to reference Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448 Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
4.
go back to reference Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788 Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
5.
go back to reference Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271 Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
6.
go back to reference Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision—ECCV 2014. Springer, Cham, pp 740–755CrossRef Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision—ECCV 2014. Springer, Cham, pp 740–755CrossRef
7.
go back to reference Zou Z, Chen K, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years: a survey. arXiv e-prints, 1905 Zou Z, Chen K, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years: a survey. arXiv e-prints, 1905
9.
go back to reference Chen C, Liu M-Y, Tuzel O, Xiao J (2017) R-CNN for small object detection. In: Lai S-H, Lepetit V, Nishino K, Sato Y (eds) Computer vision—ACCV 2016. Springer, Cham, pp 214–230CrossRef Chen C, Liu M-Y, Tuzel O, Xiao J (2017) R-CNN for small object detection. In: Lai S-H, Lepetit V, Nishino K, Sato Y (eds) Computer vision—ACCV 2016. Springer, Cham, pp 214–230CrossRef
10.
go back to reference Xiao J, Ehinger KA, Hays J, Torralba A, Oliva A (2016) Sun database: exploring a large collection of scene categories. Int J Comput Vis 119(1):3–22MathSciNetCrossRef Xiao J, Ehinger KA, Hays J, Torralba A, Oliva A (2016) Sun database: exploring a large collection of scene categories. Int J Comput Vis 119(1):3–22MathSciNetCrossRef
11.
go back to reference Chen G, Wang H, Chen K, Li Z, Song Z, Liu Y, Chen W, Knoll A (2020) A survey of the four pillars for small object detection: multiscale representation, contextual information, super-resolution, and region proposal. IEEE Trans Syst Man Cybern Syst 52(2):936–953CrossRef Chen G, Wang H, Chen K, Li Z, Song Z, Liu Y, Chen W, Knoll A (2020) A survey of the four pillars for small object detection: multiscale representation, contextual information, super-resolution, and region proposal. IEEE Trans Syst Man Cybern Syst 52(2):936–953CrossRef
13.
go back to reference Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125 Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
15.
go back to reference Liang Z, Shao J, Zhang D, Gao L (2018) Small object detection using deep feature pyramid networks. In: Pacific rim conference on multimedia. Springer, pp 554–564 Liang Z, Shao J, Zhang D, Gao L (2018) Small object detection using deep feature pyramid networks. In: Pacific rim conference on multimedia. Springer, pp 554–564
16.
go back to reference Ghiasi G, Lin T-Y, Le QV (2019) NAS-FPN: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7036–7045 Ghiasi G, Lin T-Y, Le QV (2019) NAS-FPN: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7036–7045
17.
go back to reference Qiao S, Chen L-C, Yuille A (2021) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10213–10224 Qiao S, Chen L-C, Yuille A (2021) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10213–10224
18.
go back to reference Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W et al. (2022) Yolov6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W et al. (2022) Yolov6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:​2209.​02976
19.
go back to reference Woo S, Hwang S, Kweon IS (2018) Stairnet: top-down semantic aggregation for accurate one shot detection. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1093–1102 Woo S, Hwang S, Kweon IS (2018) Stairnet: top-down semantic aggregation for accurate one shot detection. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1093–1102
20.
go back to reference Guo C, Fan B, Zhang Q, Xiang S, Pan C (2019) Augfpn: improving multi-scale feature learning for object detection. Journal Article Guo C, Fan B, Zhang Q, Xiang S, Pan C (2019) Augfpn: improving multi-scale feature learning for object detection. Journal Article
21.
go back to reference Nayan A-A, Saha J, Mozumder AN, Mahmud KR, Azad AKA (2020) Real time multi-class object detection and recognition using vision augmentation algorithm. arXiv preprint arXiv:2003.07442 Nayan A-A, Saha J, Mozumder AN, Mahmud KR, Azad AKA (2020) Real time multi-class object detection and recognition using vision augmentation algorithm. arXiv preprint arXiv:​2003.​07442
24.
go back to reference Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790 Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
26.
go back to reference Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141 Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
27.
go back to reference Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 510–519 Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 510–519
28.
go back to reference Zhang H, Wu C, Zhang Z, Zhu Y, Lin H, Zhang Z, Sun Y, He T, Mueller J, Manmatha R et al. (2022) Resnest: split-attention networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2736–2746 Zhang H, Wu C, Zhang Z, Zhu Y, Lin H, Zhang Z, Sun Y, He T, Mueller J, Manmatha R et al. (2022) Resnest: split-attention networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2736–2746
29.
go back to reference Dai Y, Gieseke F, Oehmcke S, Wu Y, Barnard K (2021) Attentional feature fusion. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3560–3569 Dai Y, Gieseke F, Oehmcke S, Wu Y, Barnard K (2021) Attentional feature fusion. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3560–3569
32.
go back to reference Zeng X, Ouyang W, Yan J, Li H, Xiao T, Wang K, Liu Y, Zhou Y, Yang B, Wang Z et al (2017) Crafting gbd-net for object detection. IEEE Trans Pattern Anal Mach Intell 40(9):2109–2123CrossRefPubMed Zeng X, Ouyang W, Yan J, Li H, Xiao T, Wang K, Liu Y, Zhou Y, Yang B, Wang Z et al (2017) Crafting gbd-net for object detection. IEEE Trans Pattern Anal Mach Intell 40(9):2109–2123CrossRefPubMed
33.
go back to reference Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans Image Process 28(5):2439–2450ADSMathSciNetCrossRef Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans Image Process 28(5):2439–2450ADSMathSciNetCrossRef
34.
go back to reference Tang X, Du DK, He Z, Liu J (2018) Pyramidbox: a context-assisted single shot face detector. In: Proceedings of the European conference on computer vision (ECCV), pp 797–813 Tang X, Du DK, He Z, Liu J (2018) Pyramidbox: a context-assisted single shot face detector. In: Proceedings of the European conference on computer vision (ECCV), pp 797–813
35.
go back to reference Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883 Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
36.
go back to reference Le QV, Jaitly N, Hinton GE (2015) A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:1504.00941 Le QV, Jaitly N, Hinton GE (2015) A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:​1504.​00941
37.
go back to reference Zhu Y, Urtasun R, Salakhutdinov R, Fidler S (2015) segdeepm: exploiting segmentation and context in deep neural networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4703–4711 Zhu Y, Urtasun R, Salakhutdinov R, Fidler S (2015) segdeepm: exploiting segmentation and context in deep neural networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4703–4711
38.
go back to reference Liu Y, Wang R, Shan S, Chen X (2018) Structure inference net: object detection using scene-level context and instance-level relationships. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6985–6994 Liu Y, Wang R, Shan S, Chen X (2018) Structure inference net: object detection using scene-level context and instance-level relationships. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6985–6994
39.
go back to reference Fu K, Li J, Ma L, Mu K, Tian Y (2020) Intrinsic relationship reasoning for small object detection. arXiv e-prints, 2009 Fu K, Li J, Ma L, Mu K, Tian Y (2020) Intrinsic relationship reasoning for small object detection. arXiv e-prints, 2009
40.
41.
go back to reference Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3588–3597 Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3588–3597
42.
go back to reference Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30
44.
go back to reference Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial networks: an overview. IEEE Signal Process Mag 35(1):53–65ADSCrossRef Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial networks: an overview. IEEE Signal Process Mag 35(1):53–65ADSCrossRef
45.
go back to reference Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1222–1230 Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1222–1230
46.
go back to reference Bai Y, Zhang Y, Ding M, Ghanem B (2018) SOD-MTGAN: small object detection via multi-task generative adversarial network. Springer, Cham, pp 210–226 Bai Y, Zhang Y, Ding M, Ghanem B (2018) SOD-MTGAN: small object detection via multi-task generative adversarial network. Springer, Cham, pp 210–226
47.
go back to reference Noh J, Bae W, Lee W, Seo J, Kim G (2019) Better to follow, follow to be better: towards precise supervision of feature super-resolution for small object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9725–9734 Noh J, Bae W, Lee W, Seo J, Kim G (2019) Better to follow, follow to be better: towards precise supervision of feature super-resolution for small object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9725–9734
48.
go back to reference Liu J, Li C, Liang F, Lin C, Sun M, Yan J, Ouyang W, Xu D (2021) Inception convolution with efficient dilation search. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11486–11495 Liu J, Li C, Liang F, Lin C, Sun M, Yan J, Ouyang W, Xu D (2021) Inception convolution with efficient dilation search. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11486–11495
49.
go back to reference Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874 Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: unifying landmark localization with end to end object detection. arXiv preprint arXiv:​1509.​04874
50.
go back to reference Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9627–9636 Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9627–9636
51.
go back to reference Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 840–849 Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 840–849
52.
go back to reference Kong T, Sun F, Liu H, Jiang Y, Li L, Shi J (2020) Foveabox: beyound anchor-based object detection. IEEE Trans Image Process 29:7389–7398ADSCrossRef Kong T, Sun F, Liu H, Jiang Y, Li L, Shi J (2020) Foveabox: beyound anchor-based object detection. IEEE Trans Image Process 29:7389–7398ADSCrossRef
53.
go back to reference Chen R, Liu Y, Zhang M, Liu S, Yu B, Tai Y-W (2020) Dive deeper into box for object detection. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, proceedings, part XXII 16. Springer, pp 412–428 Chen R, Liu Y, Zhang M, Liu S, Yu B, Tai Y-W (2020) Dive deeper into box for object detection. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, proceedings, part XXII 16. Springer, pp 412–428
54.
go back to reference Tychsen-Smith L, Petersson L (2017) Denet: scalable real-time object detection with directed sparse sampling. In: Proceedings of the IEEE international conference on computer vision, pp 428–436 Tychsen-Smith L, Petersson L (2017) Denet: scalable real-time object detection with directed sparse sampling. In: Proceedings of the IEEE international conference on computer vision, pp 428–436
56.
go back to reference Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750 Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750
57.
go back to reference Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6569–6578 Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6569–6578
58.
go back to reference Law H, Teng Y, Russakovsky O, Deng J (2019) Cornernet-lite: efficient keypoint based object detection. arXiv e-prints, 1904 Law H, Teng Y, Russakovsky O, Deng J (2019) Cornernet-lite: efficient keypoint based object detection. arXiv e-prints, 1904
59.
go back to reference Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 850–859 Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 850–859
60.
go back to reference Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: point set representation for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9657–9666 Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: point set representation for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9657–9666
61.
go back to reference Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) Faceboxes: a CPU real-time face detector with high accuracy. In: 2017 IEEE international joint conference on biometrics (IJCB). IEEE, pp 1–9 Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) Faceboxes: a CPU real-time face detector with high accuracy. In: 2017 IEEE international joint conference on biometrics (IJCB). IEEE, pp 1–9
62.
go back to reference Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S3fd: single shot scale-invariant face detector. In: Proceedings of the IEEE international conference on computer vision, pp 192–201 Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S3fd: single shot scale-invariant face detector. In: Proceedings of the IEEE international conference on computer vision, pp 192–201
63.
go back to reference Eggert C, Zecha D, Brehm S, Lienhart R (2017) Improving small object proposals for company logo detection. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, pp 167–174 Eggert C, Zecha D, Brehm S, Lienhart R (2017) Improving small object proposals for company logo detection. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, pp 167–174
65.
go back to reference Everingham M, Gool LV, Williams CKI, Winn JM, Zisserman A (2009) The Pascal visual object classes (VOC) challenge. Int J Comput Vis 88:303–338CrossRef Everingham M, Gool LV, Williams CKI, Winn JM, Zisserman A (2009) The Pascal visual object classes (VOC) challenge. Int J Comput Vis 88:303–338CrossRef
66.
67.
go back to reference Zhao M, Cheng L, Yang X, Feng P, Liu L, Wu N (2019) Tbc-net: a real-time detector for infrared small target detection using semantic constraint. arXiv preprint arXiv:2001.05852 Zhao M, Cheng L, Yang X, Feng P, Liu L, Wu N (2019) Tbc-net: a real-time detector for infrared small target detection using semantic constraint. arXiv preprint arXiv:​2001.​05852
68.
go back to reference Gao C, Meng D, Yang Y, Wang Y, Zhou X, Hauptmann AG (2013) Infrared patch-image model for small target detection in a single image. IEEE Trans Image Process 22(12):4996–5009ADSMathSciNetCrossRefPubMed Gao C, Meng D, Yang Y, Wang Y, Zhou X, Hauptmann AG (2013) Infrared patch-image model for small target detection in a single image. IEEE Trans Image Process 22(12):4996–5009ADSMathSciNetCrossRefPubMed
69.
go back to reference Chen C, Zhang Y, Lv Q, Wei S, Wang X, Sun X, Dong J (2019) Rrnet: a hybrid detector for object detection in drone-captured images. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 0–0 Chen C, Zhang Y, Lv Q, Wei S, Wang X, Sun X, Dong J (2019) Rrnet: a hybrid detector for object detection in drone-captured images. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 0–0
70.
go back to reference Chen Y, Zhang P, Li Z, Li Y, Zhang X, Qi L, Sun J, Jia J (2020) Dynamic scale training for object detection. Journal Article Chen Y, Zhang P, Li Z, Li Y, Zhang X, Qi L, Sun J, Jia J (2020) Dynamic scale training for object detection. Journal Article
71.
go back to reference Ou Z, Xiao F, Xiong B, Shi S, Song M (2019) Famn: feature aggregation multipath network for small traffic sign detection. IEEE Access 7:178798–178810CrossRef Ou Z, Xiao F, Xiong B, Shi S, Song M (2019) Famn: feature aggregation multipath network for small traffic sign detection. IEEE Access 7:178798–178810CrossRef
74.
go back to reference Zhu Z, Liang D, Zhang S, Huang X, Li B, Hu S (2016) Traffic-sign detection and classification in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2110–2118 Zhu Z, Liang D, Zhang S, Huang X, Li B, Hu S (2016) Traffic-sign detection and classification in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2110–2118
81.
go back to reference He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
82.
go back to reference Liang X, Zhang J, Zhuo L, Li Y, Tian Q (2019) Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis. IEEE Trans Circuits Syst Video Technol 30(6):1758–1770CrossRef Liang X, Zhang J, Zhuo L, Li Y, Tian Q (2019) Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis. IEEE Trans Circuits Syst Video Technol 30(6):1758–1770CrossRef
83.
go back to reference Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. Springer, Cham, pp 21–37 Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. Springer, Cham, pp 21–37
86.
go back to reference Tian G, Liu J, Yang W (2021) A dual neural network for object detection in UAV images. Neurocomputing 443:292–301CrossRef Tian G, Liu J, Yang W (2021) A dual neural network for object detection in UAV images. Neurocomputing 443:292–301CrossRef
87.
go back to reference Wang C-Y, Bochkovskiy A, Liao H-YM (2022) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 Wang C-Y, Bochkovskiy A, Liao H-YM (2022) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:​2207.​02696
88.
go back to reference Zhao H, Zhang H, Zhao Y (2023) Yolov7-sea: object detection of maritime UAV images based on improved yolov7. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 233–238 Zhao H, Zhang H, Zhao Y (2023) Yolov7-sea: object detection of maritime UAV images based on improved yolov7. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 233–238
89.
go back to reference Yang X, Yang J, Yan J, Zhang Y, Zhang T, Guo Z, Sun X, Fu K (2019) Scrdet: towards more robust detection for small, cluttered and rotated objects. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8232–8241 Yang X, Yang J, Yan J, Zhang Y, Zhang T, Guo Z, Sun X, Fu K (2019) Scrdet: towards more robust detection for small, cluttered and rotated objects. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8232–8241
90.
go back to reference Xiaolin F, Fan H, Ming Y, Tongxin Z, Ran B, Zenghui Z, Zhiyuan G (2022) Small object detection in remote sensing images based on super-resolution. Pattern Recogn Lett 153:107–112ADSCrossRef Xiaolin F, Fan H, Ming Y, Tongxin Z, Ran B, Zenghui Z, Zhiyuan G (2022) Small object detection in remote sensing images based on super-resolution. Pattern Recogn Lett 153:107–112ADSCrossRef
91.
go back to reference Han J, Ding J, Li J, Xia G-S (2021) Align deep features for oriented object detection. IEEE Trans Geosci Remote Sens 60:1–11 Han J, Ding J, Li J, Xia G-S (2021) Align deep features for oriented object detection. IEEE Trans Geosci Remote Sens 60:1–11
92.
go back to reference Xia G-S, Bai X, Ding J, Zhu Z, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2018) Dota: a large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3974–3983 Xia G-S, Bai X, Ding J, Zhu Z, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2018) Dota: a large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3974–3983
93.
go back to reference Rabbi J, Ray N, Schubert M, Chowdhury S, Chao D (2020) Small-object detection in remote sensing images with end-to-end edge-enhanced GAN and object detector network. Remote Sens 12(9):1432ADSCrossRef Rabbi J, Ray N, Schubert M, Chowdhury S, Chao D (2020) Small-object detection in remote sensing images with end-to-end edge-enhanced GAN and object detector network. Remote Sens 12(9):1432ADSCrossRef
94.
go back to reference Jiang K, Wang Z, Yi P, Wang G, Lu T, Jiang J (2019) Edge-enhanced GAN for remote sensing image superresolution. IEEE Trans Geosci Remote Sens 57(8):5799–5812ADSCrossRef Jiang K, Wang Z, Yi P, Wang G, Lu T, Jiang J (2019) Edge-enhanced GAN for remote sensing image superresolution. IEEE Trans Geosci Remote Sens 57(8):5799–5812ADSCrossRef
95.
go back to reference Courtrai L, Pham M-T, Lefèvre S (2020) Small object detection in remote sensing images based on super-resolution with auxiliary generative adversarial networks. Remote Sens 12(19):3152ADSCrossRef Courtrai L, Pham M-T, Lefèvre S (2020) Small object detection in remote sensing images based on super-resolution with auxiliary generative adversarial networks. Remote Sens 12(19):3152ADSCrossRef
96.
go back to reference Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A (2017) Improved training of Wasserstein GANs. In: Proceedings of the 31st international conference on neural information processing systems, pp 5769–5779 Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A (2017) Improved training of Wasserstein GANs. In: Proceedings of the 31st international conference on neural information processing systems, pp 5769–5779
97.
go back to reference Lim B, Son S, Kim H, Nah S, Mu Lee K (2017) Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 136–144 Lim B, Son S, Kim H, Nah S, Mu Lee K (2017) Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 136–144
98.
go back to reference Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232 Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232
100.
go back to reference Braun M, Krebs S, Flohr F, Gavrila DM (2018) The Eurocity persons dataset: a novel benchmark for object detection. arXiv preprint arXiv:1805.07193 Braun M, Krebs S, Flohr F, Gavrila DM (2018) The Eurocity persons dataset: a novel benchmark for object detection. arXiv preprint arXiv:​1805.​07193
101.
go back to reference Stallkamp J, Schlipsing M, Salmen J, Igel C (2012) Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural Netw 32:323–332CrossRefPubMed Stallkamp J, Schlipsing M, Salmen J, Igel C (2012) Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural Netw 32:323–332CrossRefPubMed
102.
go back to reference Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3221 Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3221
103.
go back to reference Li K, Wan G, Cheng G, Meng L, Han J (2020) Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J Photogramm Remote Sens 159:296–307ADSCrossRef Li K, Wan G, Cheng G, Meng L, Han J (2020) Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J Photogramm Remote Sens 159:296–307ADSCrossRef
104.
go back to reference Yu X, Gong Y, Jiang N, Ye Q, Han Z (2020) Scale match for tiny person detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1257–1265 Yu X, Gong Y, Jiang N, Ye Q, Han Z (2020) Scale match for tiny person detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1257–1265
105.
go back to reference Bondi E, Jain R, Aggrawal P, Anand S, Hannaford R, Kapoor A, Piavis J, Shah S, Joppa L, Dilkina B, et al (2020) Birdsai: a dataset for detection and tracking in aerial thermal infrared videos. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1747–1756 Bondi E, Jain R, Aggrawal P, Anand S, Hannaford R, Kapoor A, Piavis J, Shah S, Joppa L, Dilkina B, et al (2020) Birdsai: a dataset for detection and tracking in aerial thermal infrared videos. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1747–1756
106.
go back to reference Wan J, Ding W, Zhu H, Xia M, Huang Z, Tian L, Zhu Y, Wang H (2021) An efficient small traffic sign detection method based on yolov3. J Signal Process Syst 93(8):899–911CrossRef Wan J, Ding W, Zhu H, Xia M, Huang Z, Tian L, Zhu Y, Wang H (2021) An efficient small traffic sign detection method based on yolov3. J Signal Process Syst 93(8):899–911CrossRef
107.
go back to reference Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988 Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
108.
go back to reference Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022 Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
109.
go back to reference Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Proceedings of the 30th international conference on neural information processing systems, pp 379–387 Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Proceedings of the 30th international conference on neural information processing systems, pp 379–387
110.
go back to reference Azimi SM, Vig E, Bahmanyar R, Körner M, Reinartz P (2018) Towards multi-class object detection in unconstrained remote sensing imagery. In: Asian conference on computer vision. Springer, pp 150–165 Azimi SM, Vig E, Bahmanyar R, Körner M, Reinartz P (2018) Towards multi-class object detection in unconstrained remote sensing imagery. In: Asian conference on computer vision. Springer, pp 150–165
111.
go back to reference Zhang G, Lu S, Zhang W (2019) Cad-net: a context-aware detection network for objects in remote sensing imagery. IEEE Trans Geosci Remote Sens 57(12):10015–10024ADSCrossRef Zhang G, Lu S, Zhang W (2019) Cad-net: a context-aware detection network for objects in remote sensing imagery. IEEE Trans Geosci Remote Sens 57(12):10015–10024ADSCrossRef
112.
go back to reference Dhariwal P, Nichol A (2021) Diffusion models beat GANs on image synthesis. Adv Neural Inf Process Syst 34:8780–8794 Dhariwal P, Nichol A (2021) Diffusion models beat GANs on image synthesis. Adv Neural Inf Process Syst 34:8780–8794
113.
go back to reference Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229 Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229
114.
go back to reference Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:​2010.​04159
Metadata
Title
A review of small object detection based on deep learning
Authors
Wei Wei
Yu Cheng
Jiafeng He
Xiyue Zhu
Publication date
15-02-2024
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 12/2024
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-024-09422-6

Other articles of this Issue 12/2024

Neural Computing and Applications 12/2024 Go to the issue

Premium Partner