Top

Multimedia Systems

Published in:

01-02-2024 | Regular Paper

NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism

Authors: Chengang Dong, Yuhao Tang, Liyan Zhang

Published in: Multimedia Systems | Issue 1/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The primary objective of deep learning-based instance segmentation is to achieve accurate segmentation of individual objects in input images or videos. However, there exist challenges such as feature loss resulting from down-sampling operations, as well as complications arising from occlusion, deformation, and complex backgrounds, which impede the precise delineation of object instance boundaries. To address these challenges, we introduce a novel visual attention network called the Normalized Deep Attention Mechanism (NDAM) into the YOLOv8seg instance segmentation model, proposing a real-time instance segmentation method named NDAM-YOLOseg. Specifically, we optimize the feature processing methodology of YOLOv8-seg to mitigate the degradation in accuracy caused by information loss. Additionally, we introduce the NDAM to enhance the model’s discriminate focus on pivotal information, thereby further improving the accuracy of segmentation. Furthermore, a Boundary Refinement Module (BRM) is intended to enhance the segmentation of instance boundaries, resulting in an enhanced quality of mask generation. Our proposed method demonstrates competitive performance on multiple evaluation metrics across two widely-used benchmark datasets, namely MS COCO 2017 and KINS. In comparison to the baseline model YOLOv8x-seg, NDAM-YOLOseg achieves noteworthy improvements of 2.4\(\%\) and 2.5\(\%\) in terms of Average Precision (AP) on the aforementioned datasets, respectively.

previous article Event log anomaly detection method based on auto-encoder and control flow

next article Generalizing to unseen domains via PatchMix

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Wang, Z., Wang, S., Yang, S., Li, H., Li, J., Li, Z.: Weakly supervised fine-grained image classification via guassian mixture model oriented discriminative learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9749–9758 (2020)

Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: Yolact: Real-time instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9157–9166 (2019)

Wang, X., Kong, T., Shen, C., Jiang, Y., Li, L.: Solo: segmenting objects by locations. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, Springer, pp. 649–665 (2020)

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

Huang, Z., Huang, L., Gong, Y., Huang, C., Wang, X.: Mask scoring r-cnn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6409–6418 (2019)

Wang, S., Chang, J., Li, H., Wang, Z., Ouyang, W., Tian, Q.: Open-set fine-grained retrieval via prompting vision-language evaluator. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19381–19391 (2023)

Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al.: Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)

Peng, S., Jiang, W., Pi, H., Li, X., Bao, H., Zhou, X.: Deep snake for real-time instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8533–8542 (2020)

He, J., Li, P., Geng, Y., Xie, X.: Fastinst: A simple query-based model for real-time instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23663–23672 (2023)

10.

Cheng, T., Wang, X., Chen, S., Zhang, W., Zhang, Q., Huang, C., Zhang, Z., Liu, W.: Sparse instance activation for real-time instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4433–4442 (2022)

11.

Wang, H., Jin, Y., Ke, H., Zhang, X.: Ddh-yolov5: improved yolov5 based on double iou-aware decoupled head for object detection. J. Real-Time Image Process. 19(6), 1023–1033 (2022)CrossRef

12.

Xu, S., Wang, X., Lv, W., Chang, Q., Cui, C., Deng, K., Wang, G., Dang, Q., Wei, S., Du, Y., et al.: Pp-yoloe: An evolved version of yolo. arXiv preprint arXiv:2203.16250 (2022)

13.

Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)

14.

Aboah, A., Wang, B., Bagci, U., Adu-Gyamfi, Y.: Real-time multi-class helmet violation detection using few-shot data sampling technique and yolov8. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5349–5357 (2023)

15.

Ahmed, D., Sapkota, R., Churuvija, M., Karkee, M.: Machine vision-based crop-load estimation using yolov8. arXiv preprint arXiv:2304.13282 (2023)

16.

Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)

17.

Lu, C., Xia, Z., Przystupa, K., Kochan, O., Su, J.: Dcelanm-net: Medical image segmentation based on dual channel efficient layer aggregation network with learner. arXiv preprint arXiv:2304.09620 (2023)

18.

Yang, G., Li, R., Zhang, S., Wen, Y., Xu, X., Song, H.: Extracting cow point clouds from multi-view rgb images with an improved yolact++ instance segmentation. Expert Syst. Appl. 230, 120730 (2023)CrossRef

19.

Chowdhury, P.N., Sain, A., Bhunia, A.K., Xiang, T., Gryaditskaya, Y., Song, Y.-Z.: Fs-coco: towards understanding of freehand sketches of common objects in context. In: European Conference on Computer Vision, Springer, pp. 253–270 (2022)

20.

Qi, L., Jiang, L., Liu, S., Shen, X., Jia, J.: Amodal instance segmentation with kins dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3014–3023 (2019)

21.

Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4974–4983 (2019)

22.

Cheng, T., Wang, X., Huang, L., Liu, W.: Boundary-preserving mask r-cnn. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, Springer, pp. 660–676 (2020)

23.

Ke, L., Tai, Y.-W., Tang, C.-K.: Deep occlusion-aware instance segmentation with overlapping bilayers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4019–4028 (2021)

24.

Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: dynamic and fast instance segmentation. Adv. Neural Inform. Process. Syst. 33, 17721–17732 (2020)

25.

Lee, Y., Park, J.: Centermask: Real-time anchor-free instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13906–13915 (2020)

26.

Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, Springer, pp. 282–298 (2020)

27.

Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

28.

Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

29.

Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)

30.

Liu, Y., Shao, Z., Teng, Y., Hoffmann, N.: Nam: Normalization-based attention module. arXiv preprint arXiv:2111.12419 (2021)

31.

Huang, H., Lin, L., Tong, R., Hu, H., Zhang, Q., Iwamoto, Y., Han, X., Chen, Y.-W., Wu, J.: Unet 3+: A full-scale connected unet for medical image segmentation. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 1055–1059 (2020)

32.

Zhu, X., Lyu, S., Wang, X., Zhao, Q.: Tph-yolov5: Improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2778–2788 (2021)

33.

Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

34.

Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C., Dosovitskiy, A.: Do vision transformers see like convolutional neural networks? Adv. Neural Inform. Process. Syst. 34, 12116–12128 (2021)

35.

Li, B., Hu, Y., Nie, X., Han, C., Jiang, X., Guo, T., Liu, L.: Dropkey for vision transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22700–22709 (2023)

36.

Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020)

37.

Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:1312.4400 (2013)

38.

Kirillov, A., Wu, Y., He, K., Girshick, R.: Pointrend: Image segmentation as rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9799–9808 (2020)

39.

Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)CrossRef

40.

Li, Q., Li, D., Zhao, K., Wang, L., Wang, K.: State of health estimation of lithium-ion battery based on improved ant lion optimization and support vector regression. J. Energy Storage 50, 104215 (2022)CrossRef

41.

Zhao, H., Zhang, H., Zhao, Y.: Yolov7-sea: Object detection of maritime uav images based on improved yolov7. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 233–238 (2023)

42.

Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1290–1299 (2022)

43.

Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2359–2367 (2017)

44.

Zeng, X., Liu, X., Yin, J.: Amodal segmentation just like doing a jigsaw. Appl. Sci. 12(8), 4061 (2022)CrossRef

45.

Zhang, T., Wei, S., Ji, S.: E2ec: An end-to-end contour-based method for high-quality high-speed instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4443–4452 (2022)

46.

Cheng, B., Girshick, R., Dollár, P., Berg, A.C., Kirillov, A.: Boundary iou: Improving object-centric image segmentation evaluation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15334–15342 (2021)

47.

Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp. 839–847 (2018)

48.

Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: Repvgg: Making vgg-style convnets great again. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13733–13742 (2021)

49.

Han, D., Yun, S., Heo, B., Yoo, Y.: Rexnet: Diminishing representational bottleneck on convolutional neural network. arXiv preprint arXiv:2007.00992 6, 1 (2020)

50.

Wang, Z., Wang, S., Li, H., Dou, Z., Li, J.: Graph-propagation based correlation learning for weakly supervised fine-grained image classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12289–12296 (2020)

51.

Wang, S., Wang, Z., Li, H., Chang, J., Ouyang, W., Tian, Q.: Accurate fine-grained object recognition with structure-driven relation graph networks. Int. J. Comput. Vis. (2023). https://doi.org/10.1007/s11263-023-01873-zCrossRefPubMedPubMedCentral

52.

Ye, H., Zhang, B., Chen, T., Fan, J., Wang, B.: Performance-aware approximation of global channel pruning for multitask cnns. IEEE Trans. Pattern Anal. Mach. Intell. (2023). https://doi.org/10.1109/TPAMI.2023.3260903CrossRefPubMed

Title: NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism
Authors: Chengang Dong
Yuhao Tang
Liyan Zhang
Publication date: 01-02-2024
Publisher: Springer Berlin Heidelberg
Published in: Multimedia Systems / Issue 1/2024
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI: https://doi.org/10.1007/s00530-023-01212-9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2024

Depth alignment interaction network for camouflaged object detection

Yolov5s-MSD: a multi-scale ship detector for visible video image

AI and data-driven media analysis of TV content for optimised digital content marketing

SwinCT: feature enhancement based low-dose CT images denoising with swin transformer

One-step graph-based incomplete multi-view clustering

Rendering acceleration based on JND-guided sampling prediction