Top

Neural Processing Letters

Published in:

24-01-2020

Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

Authors: Yifu Liu, Chenfeng Xu, Zhihong Chen, Chao Chen, Han Zhao, Xinyu Jin

Published in: Neural Processing Letters | Issue 3/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The fusion of multi-scale features has been an effective method to get state-of-the-art performance in semantic segmentation. In this work, we concentrate on two tricky problems—the intra-class inconsistency and the blur on the localization of object boundaries and tackle them by combining two separate multi-scale context features respectively. Specifically, we propose a dual-stream structure with the scale context selection attention module to enhance the capabilities for multi-scale processing, where one stream collects global-scale context and the other captures local-scale information. Meanwhile, the embedded scale context selection attention module in each stream can adaptively focus on different scale context information to get optimal scale features. Based on our dual-stream structure with attention modules, our network can efficiently make use of multi-scale context to generate more comprehensive and powerful features. Our experiments show that our dual-stream network with scale context selection attention module achieves promising performance on the PASCAL VOC 2012 and PASCAL-Person-Part datasets.

previous article An Evaluation of RetinaNet on Indoor Object Detection for Blind and Visually Impaired Persons Assistance Navigation

next article Efficient Strategies of Static Features Incorporation into the Recurrent Neural Network

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

https://github.com/YifuLiuL/DSCANet.

Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495CrossRef

Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv e-prints, arXiv:1409.0473

Bansal A, Chen X, Russell B, Gupta A, Ramanan D (2017) PixelNet: representation of the pixels, by the pixels, and for the pixels. arXiv e-prints, arXiv:1702.06506

Buyssens P, Elmoataz A, Lézoray O (2012) Multiscale convolutional neural networks for vision-based classification of cells. In: Lee KM, Matsushita Y, Rehg JM, Hu Z (eds) Computer vision—ACCV 2012. Springer, Berlin, pp 342–352

Chen L.-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv e-prints, arXiv:1412.7062

Chen L.-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2016) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv e-prints, arXiv:1606.00915

Chen L.-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv e-prints, arXiv:1706.05587

Chen L.-C, Papandreou G, Yuille AL (2013) Learning a dictionary of shape epitomes with applications to image labeling. In: 2013 IEEE international conference on computer vision. IEEE

Chen L.-C, Yang Y, Wang J, Xu W, Yuille AL (2015) Attention to scale: scale-aware semantic image segmentation. arXiv e-prints, arXiv:1511.03339

10.

Chen X, Mottaghi R, Liu X, Fidler S, Urtasun R, Yuille A (2014) Detect what you can: detecting and representing objects using holistic models and body parts. arXiv e-prints, arXiv:1406.2031

11.

Chen Z, Chen C, Jin X, Liu Y, Cheng Z (2019) Deep joint two-stream Wasserstein auto-encoder and selective attention alignment for unsupervised domain adaptation. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04262-1

12.

Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136CrossRef

13.

Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929CrossRef

14.

Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2018) Dual attention network for scene segmentation. arXiv e-prints, arXiv:1809.02983

15.

Ganin Y, Lempitsky V (2015) N4-fields: neural network nearest neighbor fields for image transforms. In: Cremers D, Reid I, Saito H, Yang M-H (eds) Computer vision—ACCV 2014. Springer, Cham, pp 536–551

16.

Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Rodríguez JG (2017) A review on deep learning techniques applied to semantic segmentation. CoRR, arXiv:1704.06857

17.

Ghiasi G, Fowlkes CC (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation. arXiv e-prints, arXiv:1605.02264

18.

Hariharan B, Arbelaez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: 2011 international conference on computer vision. IEEE

19.

He C, Hu H (2018) Image captioning with text-based visual attention. Neural Process Lett 49(1):177–185CrossRef

20.

He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR, arXiv:1406.4729

21.

He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv e-prints, page arXiv:1512.03385

22.

Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670MathSciNetCrossRef

23.

Hong C, Yu J, Zhang J, Jin X, Lee K (2019) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inf 15(7):3952–3961CrossRef

24.

Hu J, Shen L, Albanie S, Sun G, Wu E (2017) Squeeze-and-excitation networks. arXiv e-prints, arXiv:1709.01507

25.

Kim J, Bukhari W, Lee M (2017) Feature analysis of unsupervised learning for multi-task classification using convolutional neural network. Neural Process Lett 47(3):783–797CrossRef

26.

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Neural Inf Process Syst 25:01

27.

Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2, pp 2169–2178

28.

Lee C.-Y, Xie S, Gallagher P, Zhang Z, Tu Z (2014) Deeply-supervised nets. arXiv e-prints, arXiv:1409.5185

29.

Liang X, Shen X, Feng J, Lin L, Yan S (2016) Semantic object parsing with graph LSTM. arXiv e-prints, arXiv:1603.07063

30.

Liang X, Shen X, Xiang D, Feng J, Lin L, Yan S (2015) Semantic object parsing with local-global long short-term memory. arXiv e-prints, arXiv:1511.04510

31.

Lin G, Milan A, Shen C, Reid I (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: The IEEE conference on computer vision and pattern recognition (CVPR)

32.

Lin G, Shen C, van dan Hengel A, Reid I (2015) Efficient piecewise training of deep structured models for semantic segmentation. arXiv e-prints, arXiv:1504.01013

33.

Liu W, Rabinovich A, Berg AC (2015) ParseNet: looking wider to see better. arXiv e-prints, arXiv:1506.04579

34.

Liu Z, Li X, Luo P, Change Loy C, Tang X (2015) Semantic image segmentation via deep parsing network. arXiv e-prints, arXiv:1509.02634

35.

Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. arXiv e-prints, arXiv:1406.6247

36.

Neverova N, Wolf C, Taylor GW, Nebout F (2015) Multi-scale deep learning for gesture detection and localization. In: Agapito L, Bronstein MM, Rother C (eds) Computer vision—ECCV 2014 workshops. Springer, Cham, pp 474–490CrossRef

37.

Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: 2015 IEEE international conference on computer vision (ICCV). IEEE

38.

Papandreou G, Chen L.-C, Murphy K, Yuille AL (2015) Weakly- and semi-supervised learning of a DCNN for semantic image segmentation. arXiv e-prints, arXiv:1502.02734

39.

Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters—improve semantic segmentation by global convolutional network. In: The IEEE conference on computer vision and pattern recognition (CVPR)

40.

Pohlen T, Hermans A, Mathias M, Leibe B (2017) Full-resolution residual networks for semantic segmentation in street scenes. In: The IEEE conference on computer vision and pattern recognition (CVPR)

41.

Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651CrossRef

42.

Shuai B, Zuo Z, Wang B, Wang G (2018) Scene segmentation with DAG-recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 40(6):1480–1493CrossRef

43.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv e-prints, arXiv:1706.03762

44.

Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2017) Understanding convolution for semantic segmentation. arXiv e-prints, arXiv:1702.08502

45.

Wang X, Girshick RB, Gupta A, He K (2017) Non-local neural networks. CoRR, arXiv:1711.07971

46.

Woo S, Park J, Lee J.-Y, Kweon IS (2018) CBAM: convolutional block attention module. arXiv e-prints, arXiv:1807.06521

47.

Xia F, Wang P, Chen L.-C, Yuille AL (2015) Zoom better to see clearer: human and object parsing with hierarchical auto-zoom net. arXiv e-prints, arXiv:1511.06881

48.

Xiao Y, Codevilla F, Gurram A, Urfalioglu O, López AM (2019) Multimodal end-to-end autonomous driving. arXiv e-prints, arXiv:1906.03199

49.

Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. arXiv e-prints, arXiv:1502.03044

50.

Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Learning a discriminative feature network for semantic segmentation. arXiv e-prints, arXiv:1804.09337

51.

Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032MathSciNetCrossRef

52.

Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2932058

53.

Yu J, Yang X, Gao F, Tao D (2017) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern 47(12):4014–4024CrossRef

54.

Yu J, Zhang B, Kuang Z, Lin D, Fan J (2017) iPrivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans Inf Forensics Secur 12(5):1005–1016CrossRef

55.

Yu J, Zhu C, Zhang J, Huang Q, Tao D (2019) Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2908982

56.

Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. arXiv e-prints, arXiv:1803.08904

57.

Zhang J, Yu J, Tao D (2018) Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans Image Process 27(5):2420–2432MathSciNetCrossRef

58.

Zhang W, Hu H, Hu H (2018) Training visual-semantic embedding network for boosting automatic image annotation. Neural Process Lett 48(3):1503–1519CrossRef

59.

Zhao H, Shi J, Qi X, Wang X, Jia J (2016) Pyramid scene parsing network. CoRR, arXiv:1612.01105

60.

Zhao H, Zhang Y, Liu S, Shi J, Loy CC, Lin D, Jia J (2018) Psanet: point-wise spatial attention network for scene parsing. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer, Cham, pp 270–286

61.

Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PHS (2015) Conditional random fields as recurrent neural networks. arXiv e-prints, arXiv:1502.03240

Title: Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation
Authors: Yifu Liu
Chenfeng Xu
Zhihong Chen
Chao Chen
Han Zhao
Xinyu Jin
Publication date: 24-01-2020
Publisher: Springer US
Published in: Neural Processing Letters / Issue 3/2020
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-019-10148-z

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2020

Asymptotic Stability and Polynomial Stability of Impulsive Cohen–Grossberg Neural Networks with Multi-proportional Delays

Selective Embedding with Gated Fusion for 6D Object Pose Estimation

Task-Independent Spiking Central Pattern Generator: A Learning-Based Approach

Robust Exponential Stability for Discrete-Time Quaternion-Valued Neural Networks with Time Delays and Parameter Uncertainties

Automatic Semantic Segmentation with DeepLab Dilated Learning Network for Change Detection in Remote Sensing Images

A Human Auditory Perception Loss Function Using Modified Bark Spectral Distortion for Speech Enhancement