Skip to main content
Erschienen in: Multimedia Systems 1/2024

01.02.2024 | Regular Paper

NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism

verfasst von: Chengang Dong, Yuhao Tang, Liyan Zhang

Erschienen in: Multimedia Systems | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The primary objective of deep learning-based instance segmentation is to achieve accurate segmentation of individual objects in input images or videos. However, there exist challenges such as feature loss resulting from down-sampling operations, as well as complications arising from occlusion, deformation, and complex backgrounds, which impede the precise delineation of object instance boundaries.  To address these challenges, we introduce a novel visual attention network called the Normalized Deep Attention Mechanism (NDAM) into the YOLOv8seg instance segmentation model, proposing a real-time instance segmentation method named NDAM-YOLOseg. Specifically, we optimize the feature processing methodology of YOLOv8-seg to mitigate the degradation in accuracy caused by information loss. Additionally, we introduce the NDAM to enhance the model’s discriminate focus on pivotal information, thereby further improving the accuracy of segmentation. Furthermore, a Boundary Refinement Module (BRM) is intended to enhance the segmentation of instance boundaries, resulting in an enhanced quality of mask generation. Our proposed method demonstrates competitive performance on multiple evaluation metrics across two widely-used benchmark datasets, namely MS COCO 2017 and KINS. In comparison to the baseline model YOLOv8x-seg, NDAM-YOLOseg achieves noteworthy improvements of 2.4\(\%\) and 2.5\(\%\) in terms of Average Precision (AP) on the aforementioned datasets, respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Wang, Z., Wang, S., Yang, S., Li, H., Li, J., Li, Z.: Weakly supervised fine-grained image classification via guassian mixture model oriented discriminative learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9749–9758 (2020) Wang, Z., Wang, S., Yang, S., Li, H., Li, J., Li, Z.: Weakly supervised fine-grained image classification via guassian mixture model oriented discriminative learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9749–9758 (2020)
2.
Zurück zum Zitat Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: Yolact: Real-time instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9157–9166 (2019) Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: Yolact: Real-time instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9157–9166 (2019)
3.
Zurück zum Zitat Wang, X., Kong, T., Shen, C., Jiang, Y., Li, L.: Solo: segmenting objects by locations. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, Springer, pp. 649–665 (2020) Wang, X., Kong, T., Shen, C., Jiang, Y., Li, L.: Solo: segmenting objects by locations. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, Springer, pp. 649–665 (2020)
4.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
5.
Zurück zum Zitat Huang, Z., Huang, L., Gong, Y., Huang, C., Wang, X.: Mask scoring r-cnn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6409–6418 (2019) Huang, Z., Huang, L., Gong, Y., Huang, C., Wang, X.: Mask scoring r-cnn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6409–6418 (2019)
6.
Zurück zum Zitat Wang, S., Chang, J., Li, H., Wang, Z., Ouyang, W., Tian, Q.: Open-set fine-grained retrieval via prompting vision-language evaluator. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19381–19391 (2023) Wang, S., Chang, J., Li, H., Wang, Z., Ouyang, W., Tian, Q.: Open-set fine-grained retrieval via prompting vision-language evaluator. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19381–19391 (2023)
7.
Zurück zum Zitat Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al.: Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022) Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al.: Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)
8.
Zurück zum Zitat Peng, S., Jiang, W., Pi, H., Li, X., Bao, H., Zhou, X.: Deep snake for real-time instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8533–8542 (2020) Peng, S., Jiang, W., Pi, H., Li, X., Bao, H., Zhou, X.: Deep snake for real-time instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8533–8542 (2020)
9.
Zurück zum Zitat He, J., Li, P., Geng, Y., Xie, X.: Fastinst: A simple query-based model for real-time instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23663–23672 (2023) He, J., Li, P., Geng, Y., Xie, X.: Fastinst: A simple query-based model for real-time instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23663–23672 (2023)
10.
Zurück zum Zitat Cheng, T., Wang, X., Chen, S., Zhang, W., Zhang, Q., Huang, C., Zhang, Z., Liu, W.: Sparse instance activation for real-time instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4433–4442 (2022) Cheng, T., Wang, X., Chen, S., Zhang, W., Zhang, Q., Huang, C., Zhang, Z., Liu, W.: Sparse instance activation for real-time instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4433–4442 (2022)
11.
Zurück zum Zitat Wang, H., Jin, Y., Ke, H., Zhang, X.: Ddh-yolov5: improved yolov5 based on double iou-aware decoupled head for object detection. J. Real-Time Image Process. 19(6), 1023–1033 (2022)CrossRef Wang, H., Jin, Y., Ke, H., Zhang, X.: Ddh-yolov5: improved yolov5 based on double iou-aware decoupled head for object detection. J. Real-Time Image Process. 19(6), 1023–1033 (2022)CrossRef
12.
Zurück zum Zitat Xu, S., Wang, X., Lv, W., Chang, Q., Cui, C., Deng, K., Wang, G., Dang, Q., Wei, S., Du, Y., et al.: Pp-yoloe: An evolved version of yolo. arXiv preprint arXiv:2203.16250 (2022) Xu, S., Wang, X., Lv, W., Chang, Q., Cui, C., Deng, K., Wang, G., Dang, Q., Wei, S., Du, Y., et al.: Pp-yoloe: An evolved version of yolo. arXiv preprint arXiv:​2203.​16250 (2022)
13.
Zurück zum Zitat Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)
14.
Zurück zum Zitat Aboah, A., Wang, B., Bagci, U., Adu-Gyamfi, Y.: Real-time multi-class helmet violation detection using few-shot data sampling technique and yolov8. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5349–5357 (2023) Aboah, A., Wang, B., Bagci, U., Adu-Gyamfi, Y.: Real-time multi-class helmet violation detection using few-shot data sampling technique and yolov8. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5349–5357 (2023)
15.
Zurück zum Zitat Ahmed, D., Sapkota, R., Churuvija, M., Karkee, M.: Machine vision-based crop-load estimation using yolov8. arXiv preprint arXiv:2304.13282 (2023) Ahmed, D., Sapkota, R., Churuvija, M., Karkee, M.: Machine vision-based crop-load estimation using yolov8. arXiv preprint arXiv:​2304.​13282 (2023)
16.
Zurück zum Zitat Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018) Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
17.
Zurück zum Zitat Lu, C., Xia, Z., Przystupa, K., Kochan, O., Su, J.: Dcelanm-net: Medical image segmentation based on dual channel efficient layer aggregation network with learner. arXiv preprint arXiv:2304.09620 (2023) Lu, C., Xia, Z., Przystupa, K., Kochan, O., Su, J.: Dcelanm-net: Medical image segmentation based on dual channel efficient layer aggregation network with learner. arXiv preprint arXiv:​2304.​09620 (2023)
18.
Zurück zum Zitat Yang, G., Li, R., Zhang, S., Wen, Y., Xu, X., Song, H.: Extracting cow point clouds from multi-view rgb images with an improved yolact++ instance segmentation. Expert Syst. Appl. 230, 120730 (2023)CrossRef Yang, G., Li, R., Zhang, S., Wen, Y., Xu, X., Song, H.: Extracting cow point clouds from multi-view rgb images with an improved yolact++ instance segmentation. Expert Syst. Appl. 230, 120730 (2023)CrossRef
19.
Zurück zum Zitat Chowdhury, P.N., Sain, A., Bhunia, A.K., Xiang, T., Gryaditskaya, Y., Song, Y.-Z.: Fs-coco: towards understanding of freehand sketches of common objects in context. In: European Conference on Computer Vision, Springer, pp. 253–270 (2022) Chowdhury, P.N., Sain, A., Bhunia, A.K., Xiang, T., Gryaditskaya, Y., Song, Y.-Z.: Fs-coco: towards understanding of freehand sketches of common objects in context. In: European Conference on Computer Vision, Springer, pp. 253–270 (2022)
20.
Zurück zum Zitat Qi, L., Jiang, L., Liu, S., Shen, X., Jia, J.: Amodal instance segmentation with kins dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3014–3023 (2019) Qi, L., Jiang, L., Liu, S., Shen, X., Jia, J.: Amodal instance segmentation with kins dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3014–3023 (2019)
21.
Zurück zum Zitat Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4974–4983 (2019) Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4974–4983 (2019)
22.
Zurück zum Zitat Cheng, T., Wang, X., Huang, L., Liu, W.: Boundary-preserving mask r-cnn. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, Springer, pp. 660–676 (2020) Cheng, T., Wang, X., Huang, L., Liu, W.: Boundary-preserving mask r-cnn. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, Springer, pp. 660–676 (2020)
23.
Zurück zum Zitat Ke, L., Tai, Y.-W., Tang, C.-K.: Deep occlusion-aware instance segmentation with overlapping bilayers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4019–4028 (2021) Ke, L., Tai, Y.-W., Tang, C.-K.: Deep occlusion-aware instance segmentation with overlapping bilayers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4019–4028 (2021)
24.
Zurück zum Zitat Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: dynamic and fast instance segmentation. Adv. Neural Inform. Process. Syst. 33, 17721–17732 (2020) Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: dynamic and fast instance segmentation. Adv. Neural Inform. Process. Syst. 33, 17721–17732 (2020)
25.
Zurück zum Zitat Lee, Y., Park, J.: Centermask: Real-time anchor-free instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13906–13915 (2020) Lee, Y., Park, J.: Centermask: Real-time anchor-free instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13906–13915 (2020)
26.
Zurück zum Zitat Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, Springer, pp. 282–298 (2020) Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, Springer, pp. 282–298 (2020)
27.
Zurück zum Zitat Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018) Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
28.
Zurück zum Zitat Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018) Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
29.
Zurück zum Zitat Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021) Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)
30.
31.
Zurück zum Zitat Huang, H., Lin, L., Tong, R., Hu, H., Zhang, Q., Iwamoto, Y., Han, X., Chen, Y.-W., Wu, J.: Unet 3+: A full-scale connected unet for medical image segmentation. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 1055–1059 (2020) Huang, H., Lin, L., Tong, R., Hu, H., Zhang, Q., Iwamoto, Y., Han, X., Chen, Y.-W., Wu, J.: Unet 3+: A full-scale connected unet for medical image segmentation. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 1055–1059 (2020)
32.
Zurück zum Zitat Zhu, X., Lyu, S., Wang, X., Zhao, Q.: Tph-yolov5: Improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2778–2788 (2021) Zhu, X., Lyu, S., Wang, X., Zhao, Q.: Tph-yolov5: Improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2778–2788 (2021)
33.
Zurück zum Zitat Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017) Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
34.
Zurück zum Zitat Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C., Dosovitskiy, A.: Do vision transformers see like convolutional neural networks? Adv. Neural Inform. Process. Syst. 34, 12116–12128 (2021) Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C., Dosovitskiy, A.: Do vision transformers see like convolutional neural networks? Adv. Neural Inform. Process. Syst. 34, 12116–12128 (2021)
35.
Zurück zum Zitat Li, B., Hu, Y., Nie, X., Han, C., Jiang, X., Guo, T., Liu, L.: Dropkey for vision transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22700–22709 (2023) Li, B., Hu, Y., Nie, X., Han, C., Jiang, X., Guo, T., Liu, L.: Dropkey for vision transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22700–22709 (2023)
36.
Zurück zum Zitat Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020) Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020)
38.
Zurück zum Zitat Kirillov, A., Wu, Y., He, K., Girshick, R.: Pointrend: Image segmentation as rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9799–9808 (2020) Kirillov, A., Wu, Y., He, K., Girshick, R.: Pointrend: Image segmentation as rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9799–9808 (2020)
39.
Zurück zum Zitat Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)CrossRef Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)CrossRef
40.
Zurück zum Zitat Li, Q., Li, D., Zhao, K., Wang, L., Wang, K.: State of health estimation of lithium-ion battery based on improved ant lion optimization and support vector regression. J. Energy Storage 50, 104215 (2022)CrossRef Li, Q., Li, D., Zhao, K., Wang, L., Wang, K.: State of health estimation of lithium-ion battery based on improved ant lion optimization and support vector regression. J. Energy Storage 50, 104215 (2022)CrossRef
41.
Zurück zum Zitat Zhao, H., Zhang, H., Zhao, Y.: Yolov7-sea: Object detection of maritime uav images based on improved yolov7. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 233–238 (2023) Zhao, H., Zhang, H., Zhao, Y.: Yolov7-sea: Object detection of maritime uav images based on improved yolov7. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 233–238 (2023)
42.
Zurück zum Zitat Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1290–1299 (2022) Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1290–1299 (2022)
43.
Zurück zum Zitat Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2359–2367 (2017) Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2359–2367 (2017)
44.
Zurück zum Zitat Zeng, X., Liu, X., Yin, J.: Amodal segmentation just like doing a jigsaw. Appl. Sci. 12(8), 4061 (2022)CrossRef Zeng, X., Liu, X., Yin, J.: Amodal segmentation just like doing a jigsaw. Appl. Sci. 12(8), 4061 (2022)CrossRef
45.
Zurück zum Zitat Zhang, T., Wei, S., Ji, S.: E2ec: An end-to-end contour-based method for high-quality high-speed instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4443–4452 (2022) Zhang, T., Wei, S., Ji, S.: E2ec: An end-to-end contour-based method for high-quality high-speed instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4443–4452 (2022)
46.
Zurück zum Zitat Cheng, B., Girshick, R., Dollár, P., Berg, A.C., Kirillov, A.: Boundary iou: Improving object-centric image segmentation evaluation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15334–15342 (2021) Cheng, B., Girshick, R., Dollár, P., Berg, A.C., Kirillov, A.: Boundary iou: Improving object-centric image segmentation evaluation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15334–15342 (2021)
47.
Zurück zum Zitat Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp. 839–847 (2018) Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp. 839–847 (2018)
48.
Zurück zum Zitat Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: Repvgg: Making vgg-style convnets great again. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13733–13742 (2021) Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: Repvgg: Making vgg-style convnets great again. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13733–13742 (2021)
49.
Zurück zum Zitat Han, D., Yun, S., Heo, B., Yoo, Y.: Rexnet: Diminishing representational bottleneck on convolutional neural network. arXiv preprint arXiv:2007.00992 6, 1 (2020) Han, D., Yun, S., Heo, B., Yoo, Y.: Rexnet: Diminishing representational bottleneck on convolutional neural network. arXiv preprint arXiv:​2007.​00992 6, 1 (2020)
50.
Zurück zum Zitat Wang, Z., Wang, S., Li, H., Dou, Z., Li, J.: Graph-propagation based correlation learning for weakly supervised fine-grained image classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12289–12296 (2020) Wang, Z., Wang, S., Li, H., Dou, Z., Li, J.: Graph-propagation based correlation learning for weakly supervised fine-grained image classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12289–12296 (2020)
Metadaten
Titel
NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism
verfasst von
Chengang Dong
Yuhao Tang
Liyan Zhang
Publikationsdatum
01.02.2024
Verlag
Springer Berlin Heidelberg
Erschienen in
Multimedia Systems / Ausgabe 1/2024
Print ISSN: 0942-4962
Elektronische ISSN: 1432-1882
DOI
https://doi.org/10.1007/s00530-023-01212-9

Weitere Artikel der Ausgabe 1/2024

Multimedia Systems 1/2024 Zur Ausgabe