Skip to main content
Top

26-04-2024 | Original Article

Enhancing the ability of convolutional neural networks for remote sensing image segmentation using transformers

Author: Mohammad Barr

Published in: Neural Computing and Applications

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The segmentation of remote sensing images has emerged as a compelling undertaking in computer vision owing to its use in the development of several applications. The U-Net style has been extensively utilized in many picture segmentation applications, yielding remarkable achievements. Nevertheless, the U-Net has several constraints in the context of remote sensing picture segmentation, mostly stemming from the limited scope of the convolution kernels. The transformer is a deep learning model specifically developed for sequence-to-sequence translation. It incorporates a self-attention mechanism to efficiently process many inputs, selectively retaining the relevant information and discarding the irrelevant inputs by adjusting the weights. However, it highlights a constraint in the localization capability caused by the absence of fundamental characteristics. This work presents a novel approach called U-Net–transformer, which combines the U-Net and transformer models for the purpose of remote sensing picture segmentation. The suggested solution surpasses individual models, such as U-Net and transformers, by combining and leveraging their characteristics. Initially, the transformer obtains the overall context by encoding tokenized picture patches derived from the feature maps of the convolutional neural network (CNN). Next, the encoded feature maps undergo upsampling through a decoder and are then merged with the high-resolution feature maps of the CNN model. This enables the localization to be more accurate. The transformer serves as an unconventional encoder for segmenting remote sensing images. It enhances the U-Net model by capturing localized spatial data, hence improving the capacity to capture intricate details. The U-Net–transformer, as suggested, has demonstrated exceptional performance in remote sensing picture segmentation across many benchmark datasets. The given findings demonstrated the efficacy of integrating the U-Net and transformer model for the purpose of segmenting remote sensing images.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Brunn SD (2019) The international encyclopedia of geography: people, the earth, environment and technology. AAG rev Books 7(2):77–85CrossRef Brunn SD (2019) The international encyclopedia of geography: people, the earth, environment and technology. AAG rev Books 7(2):77–85CrossRef
2.
go back to reference A. I. Godunov, Penza State University, S. T. Balanyan, P. S. Egorov, Air Force Academy named after Professor N. E. Zhukovsky and Yu. A. Gagarin, and Air Force Academy named after Professor N. E. Zhukovsky and Yu. A. Gagarin, 2021 "Image segmentation and object recognition based on convolutional neural network technology," Reliab. qual. complex syst., no. 3 A. I. Godunov, Penza State University, S. T. Balanyan, P. S. Egorov, Air Force Academy named after Professor N. E. Zhukovsky and Yu. A. Gagarin, and Air Force Academy named after Professor N. E. Zhukovsky and Yu. A. Gagarin, 2021 "Image segmentation and object recognition based on convolutional neural network technology," Reliab. qual. complex syst., no. 3
3.
go back to reference P. Wang et al., 2018 “Understanding Convolution for Semantic Segmentation,” In 2018 IEEE winter conference on applications of computer vision (WACV) P. Wang et al., 2018 “Understanding Convolution for Semantic Segmentation,” In 2018 IEEE winter conference on applications of computer vision (WACV)
4.
go back to reference S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, 2018 “Path aggregation network for instance segmentation,” In 2018 IEEE/CVF conference on computer vision and pattern recognition S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, 2018 “Path aggregation network for instance segmentation,” In 2018 IEEE/CVF conference on computer vision and pattern recognition
5.
go back to reference M. N. Mullani and P. A. Dandavate. 2019 Semantic texton forests for image categorization and segmentation. Nternational j. adv. res. comput. commun. eng. 8(4): 259–262 M. N. Mullani and P. A. Dandavate. 2019 Semantic texton forests for image categorization and segmentation. Nternational j. adv. res. comput. commun. eng. 8(4): 259–262
6.
go back to reference Smith A (2010) Image segmentation scale parameter optimization and land cover classification using the Random Forest algorithm. J Spat Sci 55(1):69–79CrossRef Smith A (2010) Image segmentation scale parameter optimization and land cover classification using the Random Forest algorithm. J Spat Sci 55(1):69–79CrossRef
7.
go back to reference Barthakur M, Sarma KK (2020) “Deep learning based semantic segmentation applied to satellite image”, in Data Visualization and Knowledge Engineering. Springer International Publishing, Cham, pp 79–107 Barthakur M, Sarma KK (2020) “Deep learning based semantic segmentation applied to satellite image”, in Data Visualization and Knowledge Engineering. Springer International Publishing, Cham, pp 79–107
8.
go back to reference Ayachi R, Said Y, Atri M (2021) A convolutional neural network to perform object detection and identification in large-scale visual data. Big Data 9(1):41–52CrossRef Ayachi R, Said Y, Atri M (2021) A convolutional neural network to perform object detection and identification in large-scale visual data. Big Data 9(1):41–52CrossRef
9.
go back to reference Afif M, Ayachi R, Said Y, Atri M (2020) Deep learning based application for indoor scene recognition. Neural Process Lett 51(3):2827–2837CrossRef Afif M, Ayachi R, Said Y, Atri M (2020) Deep learning based application for indoor scene recognition. Neural Process Lett 51(3):2827–2837CrossRef
10.
go back to reference Ayachi R, Afif M, Said Y, Atri M (2020) Traffic signs detection for real-world application of an advanced driving assisting system using deep learning. Neural Process Lett 51(1):837–851CrossRef Ayachi R, Afif M, Said Y, Atri M (2020) Traffic signs detection for real-world application of an advanced driving assisting system using deep learning. Neural Process Lett 51(1):837–851CrossRef
11.
go back to reference R. Ayachi, M. Afif, Y. Said, and A. B. Abdelaali, 2020 “Pedestrian detection for advanced driving assisting system: a transfer learning approach,” In 2020 5th international conference on advanced technologies for signal and image processing (ATSIP) R. Ayachi, M. Afif, Y. Said, and A. B. Abdelaali, 2020 “Pedestrian detection for advanced driving assisting system: a transfer learning approach,” In 2020 5th international conference on advanced technologies for signal and image processing (ATSIP)
12.
go back to reference Ronneberger O, Fischer P, Brox T (2015) “U-Net: Convolutional Networks for Biomedical Image Segmentation.” Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 234–241 Ronneberger O, Fischer P, Brox T (2015) “U-Net: Convolutional Networks for Biomedical Image Segmentation.” Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 234–241
13.
go back to reference X. Zhang, H. Yang, and E. F. Y. Young, “Attentional transfer is all you need: Technology-aware layout pattern generation,” In 2021 58th ACM/IEEE design automation conference (DAC), 2021. X. Zhang, H. Yang, and E. F. Y. Young, “Attentional transfer is all you need: Technology-aware layout pattern generation,” In 2021 58th ACM/IEEE design automation conference (DAC), 2021.
14.
go back to reference A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv [cs.CV], 2020. A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv [cs.CV], 2020.
15.
go back to reference Qi X, Li K, Liu P, Zhou X, Sun M (2020) Deep attention and multiscale networks for accurate remote sensing image segmentation. IEEE Access 8:146627–146639CrossRef Qi X, Li K, Liu P, Zhou X, Sun M (2020) Deep attention and multiscale networks for accurate remote sensing image segmentation. IEEE Access 8:146627–146639CrossRef
16.
go back to reference L. Zhou, C. Zhang, and M. Wu, "D-LinkNet: LinkNet with pretrained encoder and dilated convolution for high-resolution satellite imagery road extraction," In 2018 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), 2018. L. Zhou, C. Zhang, and M. Wu, "D-LinkNet: LinkNet with pretrained encoder and dilated convolution for high-resolution satellite imagery road extraction," In 2018 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), 2018.
17.
go back to reference F. Yu and V. Koltun, "Multiscale context aggregation by dilated convolutions," arXiv [cs.CV], 2015. F. Yu and V. Koltun, "Multiscale context aggregation by dilated convolutions," arXiv [cs.CV], 2015.
18.
go back to reference Cui B, Chen X, Lu Y (2020) Semantic segmentation of remote sensing images using transfer learning and deep convolutional neural network with dense connection. IEEE Access 8:116744–116755CrossRef Cui B, Chen X, Lu Y (2020) Semantic segmentation of remote sensing images using transfer learning and deep convolutional neural network with dense connection. IEEE Access 8:116744–116755CrossRef
19.
go back to reference G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, 2017 “Densely connected convolutional networks,” In 2017 IEEE conference on computer vision and pattern recognition (CVPR) G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, 2017 “Densely connected convolutional networks,” In 2017 IEEE conference on computer vision and pattern recognition (CVPR)
21.
go back to reference Gao L, Liu H, Yang M, Chen L, Wan Y, Xiao Z, Qian Y (2021) STransFuse: Fusing swin transformer and convolutional neural network for remote sensing image semantic segmentation. IEEE J Sel Top Appl Earth Obs Remote Sens 14:10990–11003CrossRef Gao L, Liu H, Yang M, Chen L, Wan Y, Xiao Z, Qian Y (2021) STransFuse: Fusing swin transformer and convolutional neural network for remote sensing image semantic segmentation. IEEE J Sel Top Appl Earth Obs Remote Sens 14:10990–11003CrossRef
22.
go back to reference Zou Z, Shi T, Li W, Zhang Z, Shi Z (2020) Do game data generalize well for remote sensing image segmentation? Remote Sens 12(2):275CrossRef Zou Z, Shi T, Li W, Zhang Z, Shi Z (2020) Do game data generalize well for remote sensing image segmentation? Remote Sens 12(2):275CrossRef
23.
go back to reference J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, 2017 “Unpaired image-to-image translation using cycle-consistent adversarial networks,” In 2017 IEEE international conference on computer vision (ICCV) J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, 2017 “Unpaired image-to-image translation using cycle-consistent adversarial networks,” In 2017 IEEE international conference on computer vision (ICCV)
24.
go back to reference K. He, X. Zhang, S. Ren, and J. Sun, 2016 “Deep residual learning for image recognition,” In 2016 IEEE conference on computer vision and pattern recognition (CVPR) K. He, X. Zhang, S. Ren, and J. Sun, 2016 “Deep residual learning for image recognition,” In 2016 IEEE conference on computer vision and pattern recognition (CVPR)
25.
go back to reference Liu Y, Zhu Q, Cao F, Chen J, Gang Lu (2021) High-resolution remote sensing image segmentation framework based on attention mechanism and adaptive weighting. ISPRS Int J Geo Inf 10(4):241CrossRef Liu Y, Zhu Q, Cao F, Chen J, Gang Lu (2021) High-resolution remote sensing image segmentation framework based on attention mechanism and adaptive weighting. ISPRS Int J Geo Inf 10(4):241CrossRef
26.
go back to reference Xu Z, Zhang W, Zhang T, Yang Z, Li J (2021) Efficient transformer for remote sensing image segmentation. Remote Sens 13(18):3585CrossRef Xu Z, Zhang W, Zhang T, Yang Z, Li J (2021) Efficient transformer for remote sensing image segmentation. Remote Sens 13(18):3585CrossRef
27.
go back to reference Li A, Jiao L, Zhu H, Li L, Liu F (2021) Multitask semantic boundary awareness network for remote sensing image segmentation. IEEE Trans Geosci Remote Sens 60:1–14 Li A, Jiao L, Zhu H, Li L, Liu F (2021) Multitask semantic boundary awareness network for remote sensing image segmentation. IEEE Trans Geosci Remote Sens 60:1–14
28.
go back to reference F. Rottensteiner et al., 2012 “The isprs benchmark on urban object classification and 3d building reconstruction.” ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences; I-3. 1(1): 293-298 F. Rottensteiner et al., 2012 “The isprs benchmark on urban object classification and 3d building reconstruction.” ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences; I-3. 1(1): 293-298
29.
go back to reference M. Tan and Q. V. Le, 2021 “EfficientNetV2: Smaller models and faster training,” arXiv [cs.CV] M. Tan and Q. V. Le, 2021 “EfficientNetV2: Smaller models and faster training,” arXiv [cs.CV]
30.
go back to reference D. Zhou et al., 2021 “DeepViT: Towards Deeper Vision Transformer,” arXiv [cs.CV] D. Zhou et al., 2021 “DeepViT: Towards Deeper Vision Transformer,” arXiv [cs.CV]
31.
go back to reference Liu, Ze, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021 "Swin transformer: Hierarchical vision transformer using shifted windows." In Proceedings of the IEEE/CVF international conference on computer vision. 10012–10022 Liu, Ze, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021 "Swin transformer: Hierarchical vision transformer using shifted windows." In Proceedings of the IEEE/CVF international conference on computer vision. 10012–10022
32.
go back to reference Demir, Ilke, Krzysztof Koperski, David Lindenbaum, Guan Pang, Jing Huang, Saikat Basu, Forest Hughes, Devis Tuia, and Ramesh Raskar. 2018 "Deepglobe 2018: A challenge to parse the earth through satellite images." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 172–181 Demir, Ilke, Krzysztof Koperski, David Lindenbaum, Guan Pang, Jing Huang, Saikat Basu, Forest Hughes, Devis Tuia, and Ramesh Raskar. 2018 "Deepglobe 2018: A challenge to parse the earth through satellite images." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 172–181
33.
go back to reference Zhang, Kaidong, and Dong Liu. (2023) "Customized segment anything model for medical image segmentation." arXiv preprint arXiv:2304.13785 Zhang, Kaidong, and Dong Liu. (2023) "Customized segment anything model for medical image segmentation." arXiv preprint arXiv:​2304.​13785
34.
go back to reference Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int j comput vis 88:303–338CrossRef Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int j comput vis 88:303–338CrossRef
35.
go back to reference Afaq Y, Manocha A (2021) Analysis on change detection techniques for remote sensing applications: a review. Eco Inform 63:101310CrossRef Afaq Y, Manocha A (2021) Analysis on change detection techniques for remote sensing applications: a review. Eco Inform 63:101310CrossRef
36.
go back to reference Bai T, Wang Le, Yin D, Sun K, Chen Y, Li W, Li D (2023) Deep learning for change detection in remote sensing: a review. Geo-spat Inform Sci 26(3):262–288CrossRef Bai T, Wang Le, Yin D, Sun K, Chen Y, Li W, Li D (2023) Deep learning for change detection in remote sensing: a review. Geo-spat Inform Sci 26(3):262–288CrossRef
37.
go back to reference Zhang C, Wang L, Cheng S, Li Y (2022) SwinSUNet: Pure transformer network for remote sensing image change detection. IEEE Trans Geosci Remote Sens 60:1–13 Zhang C, Wang L, Cheng S, Li Y (2022) SwinSUNet: Pure transformer network for remote sensing image change detection. IEEE Trans Geosci Remote Sens 60:1–13
38.
go back to reference Manocha A, Afaq Y (2023) Optical and SAR images-based image translation for change detection using generative adversarial network (GAN). Multimed Tools Appl 82(17):26289–26315CrossRef Manocha A, Afaq Y (2023) Optical and SAR images-based image translation for change detection using generative adversarial network (GAN). Multimed Tools Appl 82(17):26289–26315CrossRef
Metadata
Title
Enhancing the ability of convolutional neural networks for remote sensing image segmentation using transformers
Author
Mohammad Barr
Publication date
26-04-2024
Publisher
Springer London
Published in
Neural Computing and Applications
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-024-09743-6

Premium Partner