nach oben

Neural Computing and Applications

Erschienen in:

15.11.2023 | Original Article

Mitigate the scale imbalance via multi-scale information interaction in small object detection

verfasst von: Enhui Chai, Li Chen, Xingxing Hao, Wei Zhou

Erschienen in: Neural Computing and Applications | Ausgabe 4/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The scale imbalance of the backbone and the neck is the main reason for the inferior accuracy of small object detection when using the general object detector. The general object detector usually contains a complex backbone and a lightweight neck, in which the complex backbone always costs large computational resource and the lightweight neck is hard to interact with deep semantic and shallow spatial information. Thus, the general object detector has severe scale imbalance in detecting small objects. Based on these, in this paper, we propose a novel detector named IUDet which includes a lightweight backbone and a complex neck. A novel sampling strategy is proposed, named pixel-spanning merge (PSM), in the lightweight backbone to save computational cost. In other side, it can transfer features of the scale dimension to the spatial dimension, thus enhancing information interaction. Moreover, the neck is designed with the element-wise sum of multi-scale features and an inverted U-shaped skip connection to improve the small object’s feature representation. The experimental results show that our IUDet outperforms the most popular detectors on MS COCO 2017 and VisDrone DET2019, in small object detection.

Vorheriger Artikel Hybridized intelligent multi-class classifiers for rockburst risk assessment in deep underground mines

Nächster Artikel A syntactic multi-level interaction network for rumor detection

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Law H, Deng J (2020) Cornernet: detecting objects as paired keypoints. Int J Comput Vis 128(3):642–656CrossRef

Zhou X, Wang D, Krhenbühl P (2019) Objects as points. 2019. arXiv preprint https://doi.org/10.48550/arXiv.1904.07850

Tian Z, Shen C, Chen H, He T (2020) Fcos: fully convolutional one-stage object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV)

Girshick R (2015) Fast r-cnn. Proceedings of the IEEE international conference on computer vision. 2015: 1440–1448.

Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef

Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. Curran Associates Inc, Red Hook

He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: International conference on computer vision

Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell PP(99):2999–3007

Redmon J, Divvala S, Girshick R, Farhadi A (2019) You only look once: unified, real-time object detection. IEEE, Washington, D.C

10.

Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv e-prints

11.

Berg AC, Fu CY, Szegedy C, Anguelov D, Erhan D, Reed S, Liu W (2016) SSD: single shot multibox detector. 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016: 21–37

12.

Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd : deconvolutional single shot detector. arXiv preprint https://doi.org/10.48550/arXiv.1701.06659

13.

Zhao Q, Sheng T, Wang Y, Tang Z, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI conference on artificial intelligence, pp 9259–9266

14.

Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. IEEE, Washington, D.CCrossRef

15.

Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

16.

Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K (2019) Augmentation for small object detection. arXiv preprint https://doi.org/10.48550/arXiv.1902.07296

17.

Lim JS, Astrid M, Yoon HJ, Lee SI (2021) Small object detection using context and attention. 2021 international Conference on Artificial intelligence in information and Communication (ICAIIC). IEEE, 2021: 181–186.

18.

Zisserman A, Simonyan K (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint https://doi.org/10.48550/arXiv.1409.1556

19.

Szegedy C, Liu W, Jia Y, Sermanet P, Rabinovich A (2014) Going deeper with convolutions. IEEE Computer Society, Washington, D.C

20.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE, Washington, D.CCrossRef

21.

Iandola F, Moskewicz M, Karayev S, Girshick R, Darrell T, Keutzer K (2014) Densenet: implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869

22.

Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

23.

Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: alexnet-level accuracy with 50x fewer parameters and\(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360

24.

Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

25.

Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856

26.

Han K, Wang Y, Tian Q, Guo J, Xu C (2020) Ghostnet: more features from cheap operations. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

27.

Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. IEEE Computer Society, Washington, D.CCrossRef

28.

Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768

29.

He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef

30.

Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint https://doi.org/10.48550/arXiv.1706.05587

31.

Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: ICLR. arXiv preprint https://doi.org/10.48550/arXiv.1511.07122

32.

Lin TY, Maire M, Belongie S, Hays J, Zitnick CL (2014) Microsoft coco: common objects in context. Springer International Publishing, Cham

33.

Du D, Zhu P, Wen L, Bian X, Liu ZM (2019) Visdrone-det2019: the vision meets drone object detection in image challenge results. In: ICCV visdrone workshop

34.

Cao Y, Chen K, Loy CC, Lin D (2019) Prime sample attention in object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11583–11591

35.

Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. IEEE, Washington, D.CCrossRef

36.

Oksuz K, Cam BC, Kalkan S, Akbas E (2021) Imbalance problems in object detection: a review. IEEE Trans Pattern Anal Mach Intell 43(10):3388–3415CrossRef

37.

Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. ACM, New YorkCrossRef

38.

Rezatofighi H, Tsoi N, Gwak JY, Sadeghian A, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

39.

Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2019) Distance-iou loss: faster and better learning for bounding box regression. arXiv

40.

Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2020) Libra r-cnn: towards balanced learning for object detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

41.

Chen K, Li J, Lin W, See J, Zou J (2019) Towards accurate one-stage object detection with ap-loss. IEEE, Washington, D.CCrossRef

42.

Qian Q, Chen L, Li H, Jin R (2020) Dr loss: improving object detection by distributional ranking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12164–12172

43.

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations, 2021

44.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems, 30, 2017

45.

Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022

46.

Dai X, Chen Y, Xiao B, Chen D, Liu M, Yuan L, Zhang L (2021) Dynamic head: unifying object detection heads with attentions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7373-7382.

47.

Ma H, Xia X, Wang X, Xiao X, Li J, Zheng M (2022) Mocovit: mobile convolutional vision transformer. arXiv preprint arXiv:2205.12635

48.

Liu Z, Mao H, Wu CY, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11976–11986

49.

Ding X, Zhang X, Zhou Y, Han J, Ding G, Sun J (2022) Scaling up your kernels to 31x31: revisiting large kernel design in cnns. arXiv e-prints

50.

Shi W, Caballero J, Huszár F, Totz J, Aitken AP, Bishop R, Rueckert D, Wang Z (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. arXiv e-prints

51.

Cai Z, Vasconcelos N (2017) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154-6162

52.

Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: point set representation for object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV)

53.

Du X, Lin TY, Jin P, Ghiasi G, Tan M, Cui Y, Le QV, Song X (2020) Spinenet: learning scale-permuted backbone for recognition and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11592–11601

54.

Li X, Wang W, Hu X, Li J, Tang J, Yang J (2021) Generalized focal loss v2: learning reliable localization quality estimation for dense object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11632–11641

55.

Song H, Sun D, Chun S, Jampani V, Han D, Heo B, Kim W, Yang MH (2021) Vidt: an efficient and effective fully transformer-based object detector. arXiv preprint https://doi.org/10.48550/arXiv.2110.03921

56.

Wang CY, Bochkovskiy A, Liao H (2022) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv e-prints

Titel: Mitigate the scale imbalance via multi-scale information interaction in small object detection
verfasst von: Enhui Chai
Li Chen
Xingxing Hao
Wei Zhou
Publikationsdatum: 15.11.2023
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 4/2024
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-023-09122-7

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 4/2024

Robot arm damage detection using vibration data and deep learning

Classification of microscopic peripheral blood cell images using multibranch lightweight CNN-based model

Bladder cancer gene expression prediction with explainable algorithms

Faster and efficient tetrahedral mesh generation using generator neural networks for 2D and 3D geometries

SSR-TA: Sequence-to-Sequence-based expert recurrent recommendation for ticket automation

1D-convolutional transformer for Parkinson disease diagnosis from gait

Premium Partner