Skip to main content
Top
Published in: Neural Computing and Applications 14/2022

14-03-2022 | Original Article

TF-SOD: a novel transformer framework for salient object detection

Authors: Zhenyu Wang, Yunzhou Zhang, Yan Liu, Zhuo Wang, Sonya Coleman, Dermot Kerr

Published in: Neural Computing and Applications | Issue 14/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Most of existing salient object detection models are based on fully convolutional network (FCN), which learn multi-scale/level semantic information through convolutional layers to obtain high-quality predicted saliency maps. However, convolution is locally interactive, it is difficult to capture remote dependencies, and FCN-based methods suffer from coarse object boundaries. In this paper, to solve these problems, we propose a novel transformer framework for salient object detection (named TF-SOD), which mainly consists of the encoder part of the FCN, fusion module (FM), transformer module (TM) and feature decoder module (FDM). Specifically, FM is a bridge connecting the encoder and TM and provides some foresight for the non-local interaction of TM. Besides, FDM can efficiently decode the non-local features output by TM and achieve deep fusion with local features. This architecture enables the network to achieve a close integration of local and non-local interactions, making information complementary to each other, deeply mining the associated information between features. Furthermore, we also propose a novel edge reinforcement learning strategy, which can effectively suppress edge blurring from local and global aspects by means of powerful network architecture. Extensive experiments using five datasets demonstrate that the proposed method outperforms 19 state-of-the-art methods

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
6.
go back to reference Chen K, Chen JK, Chuang J, Vázquez M, Savarese S (2021) Topological planning with transformers for vision-and-language navigation. In: IEEE conference on computer vision and pattern recognition (CVPR) Chen K, Chen JK, Chuang J, Vázquez M, Savarese S (2021) Topological planning with transformers for vision-and-language navigation. In: IEEE conference on computer vision and pattern recognition (CVPR)
7.
go back to reference Chen M, Radford A, Child R, Wu J, Jun H, Luan D, Sutskever I (2020) Generative pretraining from pixels. In: Proceedings of the international conference on machine learning (ICML), pp. 1691–1703 Chen M, Radford A, Child R, Wu J, Jun H, Luan D, Sutskever I (2020) Generative pretraining from pixels. In: Proceedings of the international conference on machine learning (ICML), pp. 1691–1703
13.
go back to reference Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations (ICLR) Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations (ICLR)
17.
26.
go back to reference Li A, Zhang J, Lv Y, Liu B, Dai Y (2021) Uncertainty-aware joint salient object and camouflaged object detection. In: IEEE conference on computer vision and pattern recognition (CVPR) Li A, Zhang J, Lv Y, Liu B, Dai Y (2021) Uncertainty-aware joint salient object and camouflaged object detection. In: IEEE conference on computer vision and pattern recognition (CVPR)
32.
go back to reference Liu T, Yao S, Zhang M (2021) Auto-msfnet: search multi-scale fusion network for salient object detection. In: Proceedings of the 29th ACM international conference on multimedia Liu T, Yao S, Zhang M (2021) Auto-msfnet: search multi-scale fusion network for salient object detection. In: Proceedings of the 29th ACM international conference on multimedia
33.
go back to reference Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: IEEE conference on computer vision and pattern recognition (CVPR) Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: IEEE conference on computer vision and pattern recognition (CVPR)
35.
go back to reference Ma M, Xia C, Li J (2021) Pyramidal feature shrinking for salient object detection. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp. 2311–2318 Ma M, Xia C, Li J (2021) Pyramidal feature shrinking for salient object detection. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp. 2311–2318
38.
go back to reference Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, Tran D (2018) Image transformer. In: International conference on machine learning (ICML), pp. 4055–4064 Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, Tran D (2018) Image transformer. In: International conference on machine learning (ICML), pp. 4055–4064
54.
go back to reference Wang Y, Xu Z, Wang X, Shen C, Cheng B, Shen H, Xia H (2021) End-to-end video instance segmentation with transformers. In: IEEE conference on computer vision and pattern recognition (CVPR) Wang Y, Xu Z, Wang X, Shen C, Cheng B, Shen H, Xia H (2021) End-to-end video instance segmentation with transformers. In: IEEE conference on computer vision and pattern recognition (CVPR)
64.
go back to reference Yang F, Yang H, Fu J, Lu H, Guo B (2020) Learning texture transformer network for image super-resolution. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 5791–5800 Yang F, Yang H, Fu J, Lu H, Guo B (2020) Learning texture transformer network for image super-resolution. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 5791–5800
65.
go back to reference Yuan Y, Fu R, Huang L, Lin W, Zhang C, Xilin C, Wang J (2021) Hrformer: high-resolution vision transformer for dense predict. In: Thirty-fifth conference on neural information processing systems (NIPS) Yuan Y, Fu R, Huang L, Lin W, Zhang C, Xilin C, Wang J (2021) Hrformer: high-resolution vision transformer for dense predict. In: Thirty-fifth conference on neural information processing systems (NIPS)
71.
go back to reference Zheng S, Lu J, Zhao H, Zhu X, Zhang L (2020) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. arXiv:2012.15840 Zheng S, Lu J, Zhao H, Zhu X, Zhang L (2020) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. arXiv:​2012.​15840
76.
go back to reference Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2021) Deformable detr: deformable transformers for end-to-end object detection. In: International conference on learning representations (ICLR) Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2021) Deformable detr: deformable transformers for end-to-end object detection. In: International conference on learning representations (ICLR)
Metadata
Title
TF-SOD: a novel transformer framework for salient object detection
Authors
Zhenyu Wang
Yunzhou Zhang
Yan Liu
Zhuo Wang
Sonya Coleman
Dermot Kerr
Publication date
14-03-2022
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 14/2022
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-022-07069-9

Other articles of this Issue 14/2022

Neural Computing and Applications 14/2022 Go to the issue

Premium Partner