Skip to main content
Erschienen in: Neural Processing Letters 3/2020

24.01.2020

Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

verfasst von: Yifu Liu, Chenfeng Xu, Zhihong Chen, Chao Chen, Han Zhao, Xinyu Jin

Erschienen in: Neural Processing Letters | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The fusion of multi-scale features has been an effective method to get state-of-the-art performance in semantic segmentation. In this work, we concentrate on two tricky problems—the intra-class inconsistency and the blur on the localization of object boundaries and tackle them by combining two separate multi-scale context features respectively. Specifically, we propose a dual-stream structure with the scale context selection attention module to enhance the capabilities for multi-scale processing, where one stream collects global-scale context and the other captures local-scale information. Meanwhile, the embedded scale context selection attention module in each stream can adaptively focus on different scale context information to get optimal scale features. Based on our dual-stream structure with attention modules, our network can efficiently make use of multi-scale context to generate more comprehensive and powerful features. Our experiments show that our dual-stream network with scale context selection attention module achieves promising performance on the PASCAL VOC 2012 and PASCAL-Person-Part datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495CrossRef Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495CrossRef
2.
Zurück zum Zitat Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv e-prints, arXiv:1409.0473 Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv e-prints, arXiv:​1409.​0473
3.
Zurück zum Zitat Bansal A, Chen X, Russell B, Gupta A, Ramanan D (2017) PixelNet: representation of the pixels, by the pixels, and for the pixels. arXiv e-prints, arXiv:1702.06506 Bansal A, Chen X, Russell B, Gupta A, Ramanan D (2017) PixelNet: representation of the pixels, by the pixels, and for the pixels. arXiv e-prints, arXiv:​1702.​06506
4.
Zurück zum Zitat Buyssens P, Elmoataz A, Lézoray O (2012) Multiscale convolutional neural networks for vision-based classification of cells. In: Lee KM, Matsushita Y, Rehg JM, Hu Z (eds) Computer vision—ACCV 2012. Springer, Berlin, pp 342–352 Buyssens P, Elmoataz A, Lézoray O (2012) Multiscale convolutional neural networks for vision-based classification of cells. In: Lee KM, Matsushita Y, Rehg JM, Hu Z (eds) Computer vision—ACCV 2012. Springer, Berlin, pp 342–352
5.
Zurück zum Zitat Chen L.-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv e-prints, arXiv:1412.7062 Chen L.-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv e-prints, arXiv:​1412.​7062
6.
Zurück zum Zitat Chen L.-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2016) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv e-prints, arXiv:1606.00915 Chen L.-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2016) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv e-prints, arXiv:​1606.​00915
7.
Zurück zum Zitat Chen L.-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv e-prints, arXiv:1706.05587 Chen L.-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv e-prints, arXiv:​1706.​05587
8.
Zurück zum Zitat Chen L.-C, Papandreou G, Yuille AL (2013) Learning a dictionary of shape epitomes with applications to image labeling. In: 2013 IEEE international conference on computer vision. IEEE Chen L.-C, Papandreou G, Yuille AL (2013) Learning a dictionary of shape epitomes with applications to image labeling. In: 2013 IEEE international conference on computer vision. IEEE
9.
Zurück zum Zitat Chen L.-C, Yang Y, Wang J, Xu W, Yuille AL (2015) Attention to scale: scale-aware semantic image segmentation. arXiv e-prints, arXiv:1511.03339 Chen L.-C, Yang Y, Wang J, Xu W, Yuille AL (2015) Attention to scale: scale-aware semantic image segmentation. arXiv e-prints, arXiv:​1511.​03339
10.
Zurück zum Zitat Chen X, Mottaghi R, Liu X, Fidler S, Urtasun R, Yuille A (2014) Detect what you can: detecting and representing objects using holistic models and body parts. arXiv e-prints, arXiv:1406.2031 Chen X, Mottaghi R, Liu X, Fidler S, Urtasun R, Yuille A (2014) Detect what you can: detecting and representing objects using holistic models and body parts. arXiv e-prints, arXiv:​1406.​2031
12.
Zurück zum Zitat Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136CrossRef Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136CrossRef
13.
Zurück zum Zitat Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929CrossRef Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929CrossRef
14.
Zurück zum Zitat Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2018) Dual attention network for scene segmentation. arXiv e-prints, arXiv:1809.02983 Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2018) Dual attention network for scene segmentation. arXiv e-prints, arXiv:​1809.​02983
15.
Zurück zum Zitat Ganin Y, Lempitsky V (2015) N4-fields: neural network nearest neighbor fields for image transforms. In: Cremers D, Reid I, Saito H, Yang M-H (eds) Computer vision—ACCV 2014. Springer, Cham, pp 536–551 Ganin Y, Lempitsky V (2015) N4-fields: neural network nearest neighbor fields for image transforms. In: Cremers D, Reid I, Saito H, Yang M-H (eds) Computer vision—ACCV 2014. Springer, Cham, pp 536–551
16.
Zurück zum Zitat Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Rodríguez JG (2017) A review on deep learning techniques applied to semantic segmentation. CoRR, arXiv:1704.06857 Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Rodríguez JG (2017) A review on deep learning techniques applied to semantic segmentation. CoRR, arXiv:​1704.​06857
17.
Zurück zum Zitat Ghiasi G, Fowlkes CC (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation. arXiv e-prints, arXiv:1605.02264 Ghiasi G, Fowlkes CC (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation. arXiv e-prints, arXiv:​1605.​02264
18.
Zurück zum Zitat Hariharan B, Arbelaez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: 2011 international conference on computer vision. IEEE Hariharan B, Arbelaez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: 2011 international conference on computer vision. IEEE
19.
Zurück zum Zitat He C, Hu H (2018) Image captioning with text-based visual attention. Neural Process Lett 49(1):177–185CrossRef He C, Hu H (2018) Image captioning with text-based visual attention. Neural Process Lett 49(1):177–185CrossRef
20.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR, arXiv:1406.4729 He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR, arXiv:​1406.​4729
21.
22.
Zurück zum Zitat Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670MathSciNetCrossRef Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670MathSciNetCrossRef
23.
Zurück zum Zitat Hong C, Yu J, Zhang J, Jin X, Lee K (2019) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inf 15(7):3952–3961CrossRef Hong C, Yu J, Zhang J, Jin X, Lee K (2019) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inf 15(7):3952–3961CrossRef
25.
Zurück zum Zitat Kim J, Bukhari W, Lee M (2017) Feature analysis of unsupervised learning for multi-task classification using convolutional neural network. Neural Process Lett 47(3):783–797CrossRef Kim J, Bukhari W, Lee M (2017) Feature analysis of unsupervised learning for multi-task classification using convolutional neural network. Neural Process Lett 47(3):783–797CrossRef
26.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Neural Inf Process Syst 25:01 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Neural Inf Process Syst 25:01
27.
Zurück zum Zitat Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2, pp 2169–2178 Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2, pp 2169–2178
29.
30.
Zurück zum Zitat Liang X, Shen X, Xiang D, Feng J, Lin L, Yan S (2015) Semantic object parsing with local-global long short-term memory. arXiv e-prints, arXiv:1511.04510 Liang X, Shen X, Xiang D, Feng J, Lin L, Yan S (2015) Semantic object parsing with local-global long short-term memory. arXiv e-prints, arXiv:​1511.​04510
31.
Zurück zum Zitat Lin G, Milan A, Shen C, Reid I (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: The IEEE conference on computer vision and pattern recognition (CVPR) Lin G, Milan A, Shen C, Reid I (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: The IEEE conference on computer vision and pattern recognition (CVPR)
32.
Zurück zum Zitat Lin G, Shen C, van dan Hengel A, Reid I (2015) Efficient piecewise training of deep structured models for semantic segmentation. arXiv e-prints, arXiv:1504.01013 Lin G, Shen C, van dan Hengel A, Reid I (2015) Efficient piecewise training of deep structured models for semantic segmentation. arXiv e-prints, arXiv:​1504.​01013
34.
Zurück zum Zitat Liu Z, Li X, Luo P, Change Loy C, Tang X (2015) Semantic image segmentation via deep parsing network. arXiv e-prints, arXiv:1509.02634 Liu Z, Li X, Luo P, Change Loy C, Tang X (2015) Semantic image segmentation via deep parsing network. arXiv e-prints, arXiv:​1509.​02634
35.
36.
Zurück zum Zitat Neverova N, Wolf C, Taylor GW, Nebout F (2015) Multi-scale deep learning for gesture detection and localization. In: Agapito L, Bronstein MM, Rother C (eds) Computer vision—ECCV 2014 workshops. Springer, Cham, pp 474–490CrossRef Neverova N, Wolf C, Taylor GW, Nebout F (2015) Multi-scale deep learning for gesture detection and localization. In: Agapito L, Bronstein MM, Rother C (eds) Computer vision—ECCV 2014 workshops. Springer, Cham, pp 474–490CrossRef
37.
Zurück zum Zitat Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: 2015 IEEE international conference on computer vision (ICCV). IEEE Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: 2015 IEEE international conference on computer vision (ICCV). IEEE
38.
Zurück zum Zitat Papandreou G, Chen L.-C, Murphy K, Yuille AL (2015) Weakly- and semi-supervised learning of a DCNN for semantic image segmentation. arXiv e-prints, arXiv:1502.02734 Papandreou G, Chen L.-C, Murphy K, Yuille AL (2015) Weakly- and semi-supervised learning of a DCNN for semantic image segmentation. arXiv e-prints, arXiv:​1502.​02734
39.
Zurück zum Zitat Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters—improve semantic segmentation by global convolutional network. In: The IEEE conference on computer vision and pattern recognition (CVPR) Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters—improve semantic segmentation by global convolutional network. In: The IEEE conference on computer vision and pattern recognition (CVPR)
40.
Zurück zum Zitat Pohlen T, Hermans A, Mathias M, Leibe B (2017) Full-resolution residual networks for semantic segmentation in street scenes. In: The IEEE conference on computer vision and pattern recognition (CVPR) Pohlen T, Hermans A, Mathias M, Leibe B (2017) Full-resolution residual networks for semantic segmentation in street scenes. In: The IEEE conference on computer vision and pattern recognition (CVPR)
41.
Zurück zum Zitat Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651CrossRef Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651CrossRef
42.
Zurück zum Zitat Shuai B, Zuo Z, Wang B, Wang G (2018) Scene segmentation with DAG-recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 40(6):1480–1493CrossRef Shuai B, Zuo Z, Wang B, Wang G (2018) Scene segmentation with DAG-recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 40(6):1480–1493CrossRef
43.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv e-prints, arXiv:1706.03762 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv e-prints, arXiv:​1706.​03762
44.
Zurück zum Zitat Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2017) Understanding convolution for semantic segmentation. arXiv e-prints, arXiv:1702.08502 Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2017) Understanding convolution for semantic segmentation. arXiv e-prints, arXiv:​1702.​08502
46.
47.
Zurück zum Zitat Xia F, Wang P, Chen L.-C, Yuille AL (2015) Zoom better to see clearer: human and object parsing with hierarchical auto-zoom net. arXiv e-prints, arXiv:1511.06881 Xia F, Wang P, Chen L.-C, Yuille AL (2015) Zoom better to see clearer: human and object parsing with hierarchical auto-zoom net. arXiv e-prints, arXiv:​1511.​06881
48.
Zurück zum Zitat Xiao Y, Codevilla F, Gurram A, Urfalioglu O, López AM (2019) Multimodal end-to-end autonomous driving. arXiv e-prints, arXiv:1906.03199 Xiao Y, Codevilla F, Gurram A, Urfalioglu O, López AM (2019) Multimodal end-to-end autonomous driving. arXiv e-prints, arXiv:​1906.​03199
49.
Zurück zum Zitat Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. arXiv e-prints, arXiv:1502.03044 Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. arXiv e-prints, arXiv:​1502.​03044
50.
Zurück zum Zitat Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Learning a discriminative feature network for semantic segmentation. arXiv e-prints, arXiv:1804.09337 Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Learning a discriminative feature network for semantic segmentation. arXiv e-prints, arXiv:​1804.​09337
51.
Zurück zum Zitat Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032MathSciNetCrossRef Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032MathSciNetCrossRef
53.
Zurück zum Zitat Yu J, Yang X, Gao F, Tao D (2017) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern 47(12):4014–4024CrossRef Yu J, Yang X, Gao F, Tao D (2017) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern 47(12):4014–4024CrossRef
54.
Zurück zum Zitat Yu J, Zhang B, Kuang Z, Lin D, Fan J (2017) iPrivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans Inf Forensics Secur 12(5):1005–1016CrossRef Yu J, Zhang B, Kuang Z, Lin D, Fan J (2017) iPrivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans Inf Forensics Secur 12(5):1005–1016CrossRef
56.
Zurück zum Zitat Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. arXiv e-prints, arXiv:1803.08904 Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. arXiv e-prints, arXiv:​1803.​08904
57.
Zurück zum Zitat Zhang J, Yu J, Tao D (2018) Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans Image Process 27(5):2420–2432MathSciNetCrossRef Zhang J, Yu J, Tao D (2018) Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans Image Process 27(5):2420–2432MathSciNetCrossRef
58.
Zurück zum Zitat Zhang W, Hu H, Hu H (2018) Training visual-semantic embedding network for boosting automatic image annotation. Neural Process Lett 48(3):1503–1519CrossRef Zhang W, Hu H, Hu H (2018) Training visual-semantic embedding network for boosting automatic image annotation. Neural Process Lett 48(3):1503–1519CrossRef
60.
Zurück zum Zitat Zhao H, Zhang Y, Liu S, Shi J, Loy CC, Lin D, Jia J (2018) Psanet: point-wise spatial attention network for scene parsing. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer, Cham, pp 270–286 Zhao H, Zhang Y, Liu S, Shi J, Loy CC, Lin D, Jia J (2018) Psanet: point-wise spatial attention network for scene parsing. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer, Cham, pp 270–286
61.
Zurück zum Zitat Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PHS (2015) Conditional random fields as recurrent neural networks. arXiv e-prints, arXiv:1502.03240 Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PHS (2015) Conditional random fields as recurrent neural networks. arXiv e-prints, arXiv:​1502.​03240
Metadaten
Titel
Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation
verfasst von
Yifu Liu
Chenfeng Xu
Zhihong Chen
Chao Chen
Han Zhao
Xinyu Jin
Publikationsdatum
24.01.2020
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 3/2020
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-019-10148-z

Weitere Artikel der Ausgabe 3/2020

Neural Processing Letters 3/2020 Zur Ausgabe

Neuer Inhalt