Skip to main content
Erschienen in: Neural Computing and Applications 4/2024

15.11.2023 | Original Article

Mitigate the scale imbalance via multi-scale information interaction in small object detection

verfasst von: Enhui Chai, Li Chen, Xingxing Hao, Wei Zhou

Erschienen in: Neural Computing and Applications | Ausgabe 4/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The scale imbalance of the backbone and the neck is the main reason for the inferior accuracy of small object detection when using the general object detector. The general object detector usually contains a complex backbone and a lightweight neck, in which the complex backbone always costs large computational resource and the lightweight neck is hard to interact with deep semantic and shallow spatial information. Thus, the general object detector has severe scale imbalance in detecting small objects. Based on these, in this paper, we propose a novel detector named IUDet which includes a lightweight backbone and a complex neck. A novel sampling strategy is proposed, named pixel-spanning merge (PSM), in the lightweight backbone to save computational cost. In other side, it can transfer features of the scale dimension to the spatial dimension, thus enhancing information interaction. Moreover, the neck is designed with the element-wise sum of multi-scale features and an inverted U-shaped skip connection to improve the small object’s feature representation. The experimental results show that our IUDet outperforms the most popular detectors on MS COCO 2017 and VisDrone DET2019, in small object detection.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Law H, Deng J (2020) Cornernet: detecting objects as paired keypoints. Int J Comput Vis 128(3):642–656CrossRef Law H, Deng J (2020) Cornernet: detecting objects as paired keypoints. Int J Comput Vis 128(3):642–656CrossRef
3.
Zurück zum Zitat Tian Z, Shen C, Chen H, He T (2020) Fcos: fully convolutional one-stage object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV) Tian Z, Shen C, Chen H, He T (2020) Fcos: fully convolutional one-stage object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV)
4.
Zurück zum Zitat Girshick R (2015) Fast r-cnn. Proceedings of the IEEE international conference on computer vision. 2015: 1440–1448. Girshick R (2015) Fast r-cnn. Proceedings of the IEEE international conference on computer vision. 2015: 1440–1448.
5.
Zurück zum Zitat Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef
6.
Zurück zum Zitat Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. Curran Associates Inc, Red Hook Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. Curran Associates Inc, Red Hook
7.
Zurück zum Zitat He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: International conference on computer vision He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: International conference on computer vision
8.
Zurück zum Zitat Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell PP(99):2999–3007 Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell PP(99):2999–3007
9.
Zurück zum Zitat Redmon J, Divvala S, Girshick R, Farhadi A (2019) You only look once: unified, real-time object detection. IEEE, Washington, D.C Redmon J, Divvala S, Girshick R, Farhadi A (2019) You only look once: unified, real-time object detection. IEEE, Washington, D.C
10.
Zurück zum Zitat Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv e-prints Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv e-prints
11.
Zurück zum Zitat Berg AC, Fu CY, Szegedy C, Anguelov D, Erhan D, Reed S, Liu W (2016) SSD: single shot multibox detector. 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016: 21–37 Berg AC, Fu CY, Szegedy C, Anguelov D, Erhan D, Reed S, Liu W (2016) SSD: single shot multibox detector. 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016: 21–37
13.
Zurück zum Zitat Zhao Q, Sheng T, Wang Y, Tang Z, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI conference on artificial intelligence, pp 9259–9266 Zhao Q, Sheng T, Wang Y, Tang Z, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI conference on artificial intelligence, pp 9259–9266
14.
Zurück zum Zitat Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. IEEE, Washington, D.CCrossRef Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. IEEE, Washington, D.CCrossRef
15.
Zurück zum Zitat Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
17.
Zurück zum Zitat Lim JS, Astrid M, Yoon HJ, Lee SI (2021) Small object detection using context and attention. 2021 international Conference on Artificial intelligence in information and Communication (ICAIIC). IEEE, 2021: 181–186. Lim JS, Astrid M, Yoon HJ, Lee SI (2021) Small object detection using context and attention. 2021 international Conference on Artificial intelligence in information and Communication (ICAIIC). IEEE, 2021: 181–186.
19.
Zurück zum Zitat Szegedy C, Liu W, Jia Y, Sermanet P, Rabinovich A (2014) Going deeper with convolutions. IEEE Computer Society, Washington, D.C Szegedy C, Liu W, Jia Y, Sermanet P, Rabinovich A (2014) Going deeper with convolutions. IEEE Computer Society, Washington, D.C
20.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE, Washington, D.CCrossRef He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE, Washington, D.CCrossRef
21.
Zurück zum Zitat Iandola F, Moskewicz M, Karayev S, Girshick R, Darrell T, Keutzer K (2014) Densenet: implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869 Iandola F, Moskewicz M, Karayev S, Girshick R, Darrell T, Keutzer K (2014) Densenet: implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:​1404.​1869
22.
23.
Zurück zum Zitat Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: alexnet-level accuracy with 50x fewer parameters and\(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360 Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: alexnet-level accuracy with 50x fewer parameters and\(<\) 0.5 mb model size. arXiv preprint arXiv:​1602.​07360
24.
Zurück zum Zitat Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:​1704.​04861
25.
Zurück zum Zitat Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856 Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856
26.
Zurück zum Zitat Han K, Wang Y, Tian Q, Guo J, Xu C (2020) Ghostnet: more features from cheap operations. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) Han K, Wang Y, Tian Q, Guo J, Xu C (2020) Ghostnet: more features from cheap operations. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
27.
Zurück zum Zitat Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. IEEE Computer Society, Washington, D.CCrossRef Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. IEEE Computer Society, Washington, D.CCrossRef
28.
Zurück zum Zitat Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768 Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768
29.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef
32.
Zurück zum Zitat Lin TY, Maire M, Belongie S, Hays J, Zitnick CL (2014) Microsoft coco: common objects in context. Springer International Publishing, Cham Lin TY, Maire M, Belongie S, Hays J, Zitnick CL (2014) Microsoft coco: common objects in context. Springer International Publishing, Cham
33.
Zurück zum Zitat Du D, Zhu P, Wen L, Bian X, Liu ZM (2019) Visdrone-det2019: the vision meets drone object detection in image challenge results. In: ICCV visdrone workshop Du D, Zhu P, Wen L, Bian X, Liu ZM (2019) Visdrone-det2019: the vision meets drone object detection in image challenge results. In: ICCV visdrone workshop
34.
Zurück zum Zitat Cao Y, Chen K, Loy CC, Lin D (2019) Prime sample attention in object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11583–11591 Cao Y, Chen K, Loy CC, Lin D (2019) Prime sample attention in object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11583–11591
35.
Zurück zum Zitat Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. IEEE, Washington, D.CCrossRef Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. IEEE, Washington, D.CCrossRef
36.
Zurück zum Zitat Oksuz K, Cam BC, Kalkan S, Akbas E (2021) Imbalance problems in object detection: a review. IEEE Trans Pattern Anal Mach Intell 43(10):3388–3415CrossRef Oksuz K, Cam BC, Kalkan S, Akbas E (2021) Imbalance problems in object detection: a review. IEEE Trans Pattern Anal Mach Intell 43(10):3388–3415CrossRef
37.
Zurück zum Zitat Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. ACM, New YorkCrossRef Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. ACM, New YorkCrossRef
38.
Zurück zum Zitat Rezatofighi H, Tsoi N, Gwak JY, Sadeghian A, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) Rezatofighi H, Tsoi N, Gwak JY, Sadeghian A, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
39.
Zurück zum Zitat Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2019) Distance-iou loss: faster and better learning for bounding box regression. arXiv Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2019) Distance-iou loss: faster and better learning for bounding box regression. arXiv
40.
Zurück zum Zitat Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2020) Libra r-cnn: towards balanced learning for object detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2020) Libra r-cnn: towards balanced learning for object detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
41.
Zurück zum Zitat Chen K, Li J, Lin W, See J, Zou J (2019) Towards accurate one-stage object detection with ap-loss. IEEE, Washington, D.CCrossRef Chen K, Li J, Lin W, See J, Zou J (2019) Towards accurate one-stage object detection with ap-loss. IEEE, Washington, D.CCrossRef
42.
Zurück zum Zitat Qian Q, Chen L, Li H, Jin R (2020) Dr loss: improving object detection by distributional ranking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12164–12172 Qian Q, Chen L, Li H, Jin R (2020) Dr loss: improving object detection by distributional ranking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12164–12172
43.
Zurück zum Zitat Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations, 2021 Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations, 2021
44.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems, 30, 2017 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems, 30, 2017
45.
Zurück zum Zitat Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022 Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022
46.
Zurück zum Zitat Dai X, Chen Y, Xiao B, Chen D, Liu M, Yuan L, Zhang L (2021) Dynamic head: unifying object detection heads with attentions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7373-7382. Dai X, Chen Y, Xiao B, Chen D, Liu M, Yuan L, Zhang L (2021) Dynamic head: unifying object detection heads with attentions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7373-7382.
47.
Zurück zum Zitat Ma H, Xia X, Wang X, Xiao X, Li J, Zheng M (2022) Mocovit: mobile convolutional vision transformer. arXiv preprint arXiv:2205.12635 Ma H, Xia X, Wang X, Xiao X, Li J, Zheng M (2022) Mocovit: mobile convolutional vision transformer. arXiv preprint arXiv:​2205.​12635
48.
Zurück zum Zitat Liu Z, Mao H, Wu CY, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11976–11986 Liu Z, Mao H, Wu CY, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11976–11986
49.
Zurück zum Zitat Ding X, Zhang X, Zhou Y, Han J, Ding G, Sun J (2022) Scaling up your kernels to 31x31: revisiting large kernel design in cnns. arXiv e-prints Ding X, Zhang X, Zhou Y, Han J, Ding G, Sun J (2022) Scaling up your kernels to 31x31: revisiting large kernel design in cnns. arXiv e-prints
50.
Zurück zum Zitat Shi W, Caballero J, Huszár F, Totz J, Aitken AP, Bishop R, Rueckert D, Wang Z (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. arXiv e-prints Shi W, Caballero J, Huszár F, Totz J, Aitken AP, Bishop R, Rueckert D, Wang Z (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. arXiv e-prints
51.
Zurück zum Zitat Cai Z, Vasconcelos N (2017) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154-6162 Cai Z, Vasconcelos N (2017) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154-6162
52.
Zurück zum Zitat Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: point set representation for object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV) Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: point set representation for object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV)
53.
Zurück zum Zitat Du X, Lin TY, Jin P, Ghiasi G, Tan M, Cui Y, Le QV, Song X (2020) Spinenet: learning scale-permuted backbone for recognition and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11592–11601 Du X, Lin TY, Jin P, Ghiasi G, Tan M, Cui Y, Le QV, Song X (2020) Spinenet: learning scale-permuted backbone for recognition and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11592–11601
54.
Zurück zum Zitat Li X, Wang W, Hu X, Li J, Tang J, Yang J (2021) Generalized focal loss v2: learning reliable localization quality estimation for dense object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11632–11641 Li X, Wang W, Hu X, Li J, Tang J, Yang J (2021) Generalized focal loss v2: learning reliable localization quality estimation for dense object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11632–11641
56.
Zurück zum Zitat Wang CY, Bochkovskiy A, Liao H (2022) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv e-prints Wang CY, Bochkovskiy A, Liao H (2022) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv e-prints
Metadaten
Titel
Mitigate the scale imbalance via multi-scale information interaction in small object detection
verfasst von
Enhui Chai
Li Chen
Xingxing Hao
Wei Zhou
Publikationsdatum
15.11.2023
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 4/2024
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-023-09122-7

Weitere Artikel der Ausgabe 4/2024

Neural Computing and Applications 4/2024 Zur Ausgabe

Premium Partner