Top

International Journal of Multimedia Information Retrieval

Published in:

01-03-2024 | Regular Paper

DAABNet: depth-wise asymmetric attention bottleneck for real-time semantic segmentation

Authors: Qingsong Tang, Yingli Chen, Minghui Zhao, Shitong Min, Wuming Jiang

Published in: International Journal of Multimedia Information Retrieval | Issue 1/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

With the increasing demand for the real-world applications such as autonomous driving and video surveillance, lightweight semantic segmentation methods achieving good trade-offs in terms of parameter size, speed and accuracy have attracted more and more attention. In this context, we propose a novel real-time semantic segmentation model. First, we design a two-branch depth-wise asymmetric attention bottleneck (DAAB) based on residual network to reduce the number of parameters and improve the inference speed. Particularly, an attention refinement module (ARM) is added in the DAAB module to make the information extracted from the two branches complement each other. Second, we design a strip pooling attention (SPA) module which combines the strip pooling module and the attention mechanism to pay more attention to strip-shaped objects and to establish long-range dependencies between discrete distributed regions, so that to address the problem of poor segmentation of strip shape objects. In addition, we also fuse information from different stages to compensate for the loss of spatial information, thus improving the ability of the network to segment small objects. Experiments on CityScapes and CamVid dataset demonstrate that the proposed method achieves impressive trade-offs in terms of parameter size, speed and accuracy. Code is available at: https://github.com/mhhz/DAABnet1.

previous article A new CNN-based semantic object segmentation for autonomous vehicles in urban traffic scenes

next article A voting-based novel spatio-temporal fusion framework for video saliency using transfer learning mechanism

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

In this case, DAABNet obtains \(68.8\%\) mIoU at 92.68 FPS with 0.94M parameters. The model achieves a good balance between low-resolution datasets CamVid and high-resolution datasets CityScapes.

Minaee S, Boykov YY, Porikli F, Plaza AJ, Kehtarnavaz N, Terzopoulos D (2021) Image segmentation using deep learning: a survey. IEEE Trans Pattern Anal Mach Intell 44(7):3523–3542

Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440

Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1520–1528

Zhao H, Qi X, Shen X, Shi J, Jia J (2018) ICNet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European conference on computer vision (ECCV), pp 418–434

Yang Z, Yu H, Feng M et al (2020) Small object augmentation of urban scenes for real-time semantic segmentation. IEEE Trans Image Process 29:5175–5190CrossRef

Romera E, Alvarez JM, Bergasa LM, Arroyo R (2018) ERFNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19:263–272CrossRef

Szegedy C, Ioffe S, Vanhoucke V (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence (AAAI), pp 4278–4284

Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4510–4520

Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, Le QV, Adam H (2019) Searching for MobileNetv3. In: IEEE/CVF international conference on computer vision (ICCV), pp 1314–1324

10.

Zhang X, Zhou X Y, Lin M X, Sun R (2018) ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6848–6856

11.

Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: 30th IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1800–1807

12.

Wu Z, Shen C, Hengel A (2017) Real-time semantic image segmentation via spatial sparsity. arXiv preprint arXiv:1712.00213

13.

Badrinarayanan V, Kendal A, Cipolla R (2017) SegNet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495CrossRef

14.

Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147

15.

Hou Q, Zhang L, Cheng M M, et al (2020) Strip pooling: rethinking spatial pooling for scene parsing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4003–4012

16.

Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9

17.

Szegedy C, Vanhoucke V, Ioffe S, et al (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826

18.

Howard A G, Zhu M, Chen B, et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

19.

Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848CrossRef

20.

Wu T, Tang S, Zhang R et al (2021) CGNet: a light-weight context guided network for semantic segmentation. IEEE Trans Image Process 30:1169–1179CrossRef

21.

Mehta S, Rastegari M, Caspi A, et al (2018) ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 552–568

22.

Zhuang M, Zhong X, Gu D et al (2021) LRDNet: a lightweight and efficient network with refined dual attention decoder for real-time semantic segmentation. Neurocomputing 459:349–360CrossRef

23.

Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241

24.

Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2881–2890

25.

Li H, Xiong P, Fan H, et al (2019) DFANet: deep feature aggregation for real-time semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9522–9531

26.

Li G, Yun I, Kim J, et al (2019) DABNet: depth-wise asymmetric bottleneck for real-time semantic segmentation. arXiv preprint arXiv:1907.11357

27.

Wang Y, Zhou Q, Liu J, et al (2019) LEDNet: a lightweight encoder–decoder network for real-time semantic segmentation. In: 2019 IEEE international conference on image processing (ICIP), pp 1860–1864

28.

Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6450–6458

29.

Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

30.

Wang Q, Wu B, Zhu P, Li P, Hu Q (2020) ECANet: efficient channel attention for deep convolutional. In: IEEE/CVF conference on computer vision and pattern recognition (ECCV), pp 11531–11539

31.

Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 31st IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7132–7141

32.

Wang X, Girshick R B, Gupta A, He K (2018) Non-local neural networks. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7794–7803

33.

Cao Y, Xu J, Lin S, Wei F, Hu H (2019) GCNet: non-local networks meet squeeze excitation networks and beyond. In: 2019 IEEE/CVF international conference on computer vision workshop (ICCVW), pp 1971–1980

34.

Huang Z, Wang X, Huang L, Huang C, Wei Y, Shi H, Liu W (2019) CCNet: criss cross attention for semantic segmentation. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 603–612

35.

Kundu S, Sundaresan S (2021) AttentionLite: towards efficient self-attention models for vision. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2225–2229

36.

Zhao H, Zhang Y, Liu S, Shi, J, Loy C C, Lin D, Jia J (2018) PSANet: point-wise spatial attention network for scene parsing. In: Proceedings of the European conference on computer vision (ECCV), pp 267–283

37.

Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13713–13722

38.

Yu C, Wang J, Peng C, Gao C, Yu G, Sang G (2018) Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 325–341

39.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

40.

Yu C, Gao C, Wang J et al (2021) BiSeNetv2: bilateral network with guided aggregation for real-time semantic segmentation. Int J Comput Vis 129(11):3051–3068CrossRef

41.

Elhassan M, Huang C, Yang C et al (2021) DSANet: dilated spatial attention for real-time semantic segmentation in urban street scenes. Expert Syst Appl 183:115090CrossRef

42.

Fan M, Lai S, Huang J, et al (2021) Rethinking bisenet for real-time semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9716–9725

43.

Brostow GJ, Fauqueur J, Cipolla R (2009) Semantic object classes in video: a high-definition ground truth database. Pattern Recogn Lett 30:88–97CrossRef

44.

Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The CityScapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3213–3223

45.

Orsic M, Kreso I, Bevandic P, Segvic S (2019) In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 12607–12616

46.

Zhou Q, Wang Y, Fan Y, Wu X, Kang B (2020) AGLNet: towards real-time semantic segmentation of self-driving images via attention-guided lightweight network. Appl Soft Comput 96:106682CrossRef

47.

Gao G, Xu G, Yu Y et al (2021) MSCFNet: a lightweight network with multi-scale context fusion for real-time semantic segmentation. IEEE Trans Intell Transp Syst 23(12):25489–25499CrossRef

48.

Lu M, Chen Z, Wu Q et al (2022) FRNet: factorized and regular blocks network for semantic segmentation in road scene. IEEE Trans Intell Transp Syst 23(4):3522–3530CrossRef

49.

Li Y, Li M, Li Z et al (2022) EFRNet: efficient feature reuse network for real-time semantic segmentation. Neural Process Lett 55(1):873–873CrossRef

50.

Hu X, Jing L, Sehar U (2022) Joint pyramid attention network for real-time semantic segmentation of urban scenes. Appl Intell Int J Artif Intell 52(1):580–594CrossRef

51.

Hu X, Gong J (2022) LARFNet: lightweight asymmetric refining fusion network for real-time semantic segmentation. Comput Graph 109:55–64CrossRef

52.

Mazhar S, Atif N, Bhuyan MK, Ahamed SR (2023) Block attention network: a lightweight deep network for real-time semantic segmentation of road scenes in resource-constrained devices. Eng Appl Artif Intell (PC) 126:107086CrossRef

53.

Hu X, Liu Y (2023) Lightweight multi-scale attention-guided network for real-time semantic segmentation. Image Vis Comput 139:1041823CrossRef

54.

Li G, Jiang S, Yun I, Kim J, Kim J (2020) Depth-wise asymmetric bottleneck with point-wise aggregation decoder for real-time semantic segmentation in urban scenes. IEEE Access 8:27495–27506CrossRef

55.

Zhang X, Chen Z, Wu Q et al (2019) Fast semantic segmentation for scene perception. IEEE Trans Ind Inf 15(2):1183–1192CrossRef

56.

Poudel R, Liwicki S, Cipolla R (2019) Fast-SCNN: fast semantic segmentation network. arXiv preprint arXiv:1902.04502

57.

Yang Z, Yu H, Fu Q et al (2020) NDNet: narrow while deep network for real-time semantic segmentation. IEEE Trans Intell Transp Syst 22(9):5508–5519CrossRef

58.

Poudel R, Bonde U, Liwicki S, et al (2018) ContextNet: exploring context and detail for semantic segmentation in real time. arXiv preprint arXiv:1805.04554

59.

Lo S, Hang H, Chan S, et al (2019) Efficient dense modules of asymmetric convolution for real-time semantic segmentation. ACM Multimedia Asia, pp 1–6

Title: DAABNet: depth-wise asymmetric attention bottleneck for real-time semantic segmentation
Authors: Qingsong Tang
Yingli Chen
Minghui Zhao
Shitong Min
Wuming Jiang
Publication date: 01-03-2024
Publisher: Springer London
Published in: International Journal of Multimedia Information Retrieval / Issue 1/2024
Print ISSN: 2192-6611
Electronic ISSN: 2192-662X
DOI: https://doi.org/10.1007/s13735-024-00321-z

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2024

Cross-modal retrieval based on shared proxies

VERITE: a Robust benchmark for multimodal misinformation detection accounting for unimodal bias

Text-assisted attention-based cross-modal hashing

Image enhancement with bi-directional normalization and color attention-guided generative adversarial networks

Deep multiple aggregation networks for action recognition

Augmented inputs for surveillance re-identification

Premium Partner