Top

Neural Computing and Applications

Published in:

14-03-2022 | Original Article

TF-SOD: a novel transformer framework for salient object detection

Authors: Zhenyu Wang, Yunzhou Zhang, Yan Liu, Zhuo Wang, Sonya Coleman, Dermot Kerr

Published in: Neural Computing and Applications | Issue 14/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Most of existing salient object detection models are based on fully convolutional network (FCN), which learn multi-scale/level semantic information through convolutional layers to obtain high-quality predicted saliency maps. However, convolution is locally interactive, it is difficult to capture remote dependencies, and FCN-based methods suffer from coarse object boundaries. In this paper, to solve these problems, we propose a novel transformer framework for salient object detection (named TF-SOD), which mainly consists of the encoder part of the FCN, fusion module (FM), transformer module (TM) and feature decoder module (FDM). Specifically, FM is a bridge connecting the encoder and TM and provides some foresight for the non-local interaction of TM. Besides, FDM can efficiently decode the non-local features output by TM and achieve deep fusion with local features. This architecture enables the network to achieve a close integration of local and non-local interactions, making information complementary to each other, deeply mining the associated information between features. Furthermore, we also propose a novel edge reinforcement learning strategy, which can effectively suppress edge blurring from local and global aspects by means of powerful network architecture. Extensive experiments using five datasets demonstrate that the proposed method outperforms 19 state-of-the-art methods

previous article Self-adaptive differential evolution with Gaussian–Cauchy mutation for large-scale CHP economic dispatch problem

next article Brain tumor segmentation using river formation dynamics and active contour model in magnetic resonance images

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Achanta R, Hemami SS, Estrada FJ, Susstrunk S (2009) Frequency-tuned salient region detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1597–1604 https://doi.org/10.1109/CVPR.2009.5206596

Achanta R, Süsstrunk S (2010) Saliency detection using maximum symmetric surround. In: IEEE international conference on image processing, pp. 2653–2656. https://doi.org/10.1109/ICIP.2010.5652636

Baek D, Kang H, Ryoo J (2020) Sali360: design and implementation of saliency based video compression for 360\(^\circ\) video streaming. In: Proceedings of the 11th ACM multimedia systems conference (MMSYS), pp. 141–152 https://doi.org/10.1145/3339825.3391866

Cane T, Ferryman J (2016) Saliency-based detection for maritime object tracking. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp. 1257–1264 https://doi.org/10.1109/CVPRW.2016.159

Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision (ECCV), pp. 213–229 https://doi.org/10.1007/978-3-030-58452-8_13

Chen K, Chen JK, Chuang J, Vázquez M, Savarese S (2021) Topological planning with transformers for vision-and-language navigation. In: IEEE conference on computer vision and pattern recognition (CVPR)

Chen M, Radford A, Child R, Wu J, Jun H, Luan D, Sutskever I (2020) Generative pretraining from pixels. In: Proceedings of the international conference on machine learning (ICML), pp. 1691–1703

Chen Z, Xu Q, Cong R, Huang Q (2020) Global context-aware progressive aggregation network for salient object detection. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp. 10599–10606. https://doi.org/10.1609/aaai.v34i07.6633

Cheng MM, Mitra NJ, Huang X, Torr P, Hu SM (2015) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582. https://doi.org/10.1109/TPAMI.2014.2345401CrossRef

10.

Cheng M.M, Warrell J, Lin W.Y, Zheng S, Vineet V, Crook N (2013) Efficient salient region detection with soft image abstraction. In: IEEE international conference on computer vision (ICCV), pp. 1529–1536 https://doi.org/10.1109/ICCV.2013.193

11.

Craye C, Filliat D, Goudou J.F (2016) Environment exploration for object-based visual saliency learning. In: IEEE international conference on robotics and automation (ICRA), pp. 2303–2309 https://doi.org/10.1109/ICRA.2016.7487379

12.

Desimone R, Duncan J (1995) Neural mechanisms of selective visual attention. Ann Rev Neurosci 18(1):193–222. https://doi.org/10.1146/annurev.neuro.18.1.193CrossRef

13.

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations (ICLR)

14.

Fan DP, Cheng MM, Liu Y, Li T, Borji A (2017) Structure-measure: a new way to evaluate foreground maps. In: IEEE international conference on computer vision (ICCV), pp. 4558–4567 https://doi.org/10.1109/ICCV.2017.487

15.

Fan DP, Gong C, Cao Y, Ren B, Cheng MM, Borji A (2018) Enhanced-alignment measure for binary foreground map evaluation. In: International joint conferences on artificial intelligence (IJCAI), pp. 698–704 https://doi.org/10.24963/ijcai.2018/97

16.

Feng M, Lu H, Ding E (2019) Attentive feedback network for boundary-aware salient object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1623–1632 https://doi.org/10.1109/CVPR.2019.00172

17.

Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 4476–4484 https://doi.org/10.1109/CVPR.2017.476

18.

Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PH (2017) Deeply supervised salient object detection with short connections. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5300–5309 https://doi.org/10.1109/CVPR.2017.563

19.

Huang L, Tan J, Liu J, Yuan J (2020) Hand-transformer: non-autoregressive structured modeling for 3d hand pose estimation. In: European conference on computer vision (ECCV), pp. 17–33 https://doi.org/10.1007/978-3-030-58595-2_2

20.

Huang L, Tan J, Meng J, Liu J, Yuan J (2020) Hot-net: non-autoregressive transformer for 3d hand-object pose estimation. In: Proceedings of the ACM international conference on multimedia (MM), pp. 3136–3145 https://doi.org/10.1145/3394171.3413775

21.

Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259. https://doi.org/10.1109/34.730558CrossRef

22.

Jiang B, Zhang L, Lu H, Yang C, Yang MH (2013) Saliency detection via absorbing markov chain. In: IEEE international conference on computer vision (ICCV), pp. 1665–1672 https://doi.org/10.1109/ICCV.2013.209

23.

Jiang Z, Davis LS (2013) Submodular salient region detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 2043–2050 https://doi.org/10.1109/CVPR.2013.266

24.

Kompella A, Kulkarni RV (2020) Weakly supervised multi-scale recurrent convolutional neural network for co-saliency detection and co-segmentation. 32: 16571–16588. https://doi.org/10.1007/s00521-019-04265-y

25.

Kompella A, Kulkarni RV (2021) A semi-supervised recurrent neural network for video salient object detection. 33: 2065–2083. https://doi.org/10.1007/s00521-020-05081-5

26.

Li A, Zhang J, Lv Y, Liu B, Dai Y (2021) Uncertainty-aware joint salient object and camouflaged object detection. In: IEEE conference on computer vision and pattern recognition (CVPR)

27.

Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 5455–5463 https://doi.org/10.1109/CVPR.2015.7299184

28.

Li X, Yin X, Li C, Hu X, Zhang P, Zhang L, Wang L, Hu H, Dong L, Wei F, Choi Y, Gao J (2020) Oscar: object-semantics aligned pre-training for vision-language tasks. In: European conference on computer vision (ECCV) pp. 121–137 https://doi.org/10.1007/978-3-030-58577-8_8)

29.

Li Y, Hou X, Koch C, Rehg J.M, Yuille AL (2014) The secrets of salient object segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 280–287 https://doi.org/10.1109/CVPR.2014.43

30.

Liu JJ, Hou Q, Cheng MM (2020) Dynamic feature integration for simultaneous detection of salient object, edge and skeleton. IEEE Trans Image Process 29:8652–8667. https://doi.org/10.1109/TIP.2020.3017352CrossRef

31.

Liu JJ, Hou Q, Cheng MM, Feng J, Jiang J (2019) A simple pooling-based design for real-time salient object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 3912–3921 https://doi.org/10.1109/CVPR.2019.00404

32.

Liu T, Yao S, Zhang M (2021) Auto-msfnet: search multi-scale fusion network for salient object detection. In: Proceedings of the 29th ACM international conference on multimedia

33.

Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: IEEE conference on computer vision and pattern recognition (CVPR)

34.

Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 3431–3440 https://doi.org/10.1109/CVPR.2015.7298965

35.

Ma M, Xia C, Li J (2021) Pyramidal feature shrinking for salient object detection. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp. 2311–2318

36.

Mohammadi S, Noori M, Bahri A, Ghofrani Majelan S, Havaei M (2020) Cagnet: content-aware guidance for salient object detection. Pattern Recognit. https://doi.org/10.1016/j.patcog.2020.107303CrossRef

37.

Pang Y, Zhao X, Zhang L, Lu H (2020) Multi-scale interactive network for salient object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 9410–9419 https://doi.org/10.1109/CVPR42600.2020.00943

38.

Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, Tran D (2018) Image transformer. In: International conference on machine learning (ICML), pp. 4055–4064

39.

Perazzi F, Krähenbühl P, Pritch Y, Hornung A (2012) Saliency filters: contrast based filtering for salient region detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 733–740 https://doi.org/10.1109/CVPR.2012.6247743

40.

Qin X, Zhang Z, Huang C, Dehghan M, Jagersand M (2020) U2-net: going deeper with nested u-structure for salient object detection. Pattern Recognit. https://doi.org/10.1016/j.patcog.2020.107404CrossRef

41.

Qin X, Zhang Z, Huang C, Gao C, Dehghan M, Jagersand M (2019) Basnet: boundary-aware salient object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 7471–7481 https://doi.org/10.1109/CVPR.2019.00766

42.

Scharfenberger C, Wong A, Clausi DA (2015) Structure-guided statistical textural distinctiveness for salient region detection in natural images. IEEE Trans Image Process 24(1):457–470. https://doi.org/10.1109/TIP.2014.2380351MathSciNetCrossRefMATH

43.

Shi J, Yan Q, Xu L, Jia J (2016) Hierarchical image saliency detection on extended cssd. IEEE Trans Pattern Anal Mach Intell 38(4):717–729. https://doi.org/10.1109/TPAMI.2015.2465960CrossRef

44.

Su J, Li J, Zhang Y, Xia C, Tian, Y (2019) Selectivity or invariance: boundary-aware salient object detection. In: IEEE international conference on computer vision (ICCV), pp. 3798–3807 https://doi.org/10.1109/ICCV.2019.00390

45.

Sun J, Lu H, Liu X (2015) Saliency region detection based on markov absorption probabilities. IEEE Trans Image Process 24(5):1639–1649. https://doi.org/10.1109/TIP.2015.2403241MathSciNetCrossRefMATH

46.

Teuber HL (1955) Physiological psychology. Ann Rev Psychol 6(1):267–96. https://doi.org/10.1146/annurev.ps.06.020155.001411CrossRef

47.

Tong N, Lu H, Zhang Y, Ruan X (2015) Salient object detection via global and local cues. Pattern Recognit 48(10):3258–3267. https://doi.org/10.1016/j.patcog.2014.12.005CrossRef

48.

Wang B, Chen Q, Zhou M, Zhang Z, Jin X, Gai K (2020) Progressive feature polishing network for salient object detection. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp. 12128–12135 https://doi.org/10.1609/aaai.v34i07.6892

49.

Wang H, Li Z, Li Y, Gupta BB, Choi C (2020) Visual saliency guided complex image retrieval. Pattern Recognit Lett 130:64–72. https://doi.org/10.1016/j.patrec.2018.08.010CrossRef

50.

Wang K, Lin L, Lu J, Li C, Shi K (2015) Pisa: Pixelwise image saliency by aggregating complementary appearance contrast measures with edge-preserving coherence. IEEE Trans Image Process 24(10):3019–3033. https://doi.org/10.1109/TIP.2015.2432712MathSciNetCrossRefMATH

51.

Wang K, Ma S, Chen J, Lu J (2020) Salient bundle adjustment for visual slam. arXiv:2012.11863

52.

Wang L, Lu H, Wang Y, Feng M, Wang D, Yin B, Ruan X (2017) Learning to detect salient objects with image-level supervision. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 3796–3805 https://doi.org/10.1109/CVPR.2017.404

53.

Wang W, Zhao S, Shen J, Hoi SCH, Borji A (2019) Salient object detection with pyramid attention and salient edges. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1448–1457 https://doi.org/10.1109/CVPR.2019.00154

54.

Wang Y, Xu Z, Wang X, Shen C, Cheng B, Shen H, Xia H (2021) End-to-end video instance segmentation with transformers. In: IEEE conference on computer vision and pattern recognition (CVPR)

55.

Wang Z, Du L, Zhang P, Li L, Wang F, Xu S, Su H (2018) Visual attention-based target detection and discrimination for high-resolution sar images in complex scenes. IEEE Trans Geosci Remote Sens 56(4):1855–1872. https://doi.org/10.1109/TGRS.2017.2769045CrossRef

56.

Wang Z, Zhang Y, Liu Y, Liu S, Coleman S, Kerr D (2021) Mfc-net: multi-feature fusion cross neural network for salient object detection. Image Vis Comput. https://doi.org/10.1016/j.imavis.2021.104243CrossRef

57.

Wei J, Wang S, Huang Q (2020) F3net: fusion, feedback and focus for salient object detection. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp. 12321–12328 https://doi.org/10.1609/aaai.v34i07.6916

58.

Wei J, Wang S, Wu Z, Su C, Huang Q, Tian Q (2020) Label decoupling framework for salient object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 13022–13031 https://doi.org/10.1109/CVPR42600.2020.01304

59.

Wu Z, Su L, Huang Q (2019) Cascaded partial decoder for fast and accurate salient object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 3902–3911 https://doi.org/10.1109/CVPR.2019.00403

60.

Wu Z, Su L, Huang Q (2019) Stacked cross refinement network for edge-aware salient object detection. In: IEEE international conference on computer vision (ICCV), pp. 7263–7272 https://doi.org/10.1109/ICCV.2019.00736

61.

Wu Z, Su L, Huang Q (2021) Decomposition and completion network for salient object detection. IEEE Trans Image Process 30:6226–6239. https://doi.org/10.1109/TIP.2021.3093380CrossRef

62.

Yan Q, Xu L, Shi J, Jia J (2013) Hierarchical saliency detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1155–1162 https://doi.org/10.1109/CVPR.2013.153

63.

Yang C, Zhang L, Lu H, Ruan X, Yang MH (2013) Saliency detection via graph-based manifold ranking. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 3166–3173 https://doi.org/10.1109/CVPR.2013.407

64.

Yang F, Yang H, Fu J, Lu H, Guo B (2020) Learning texture transformer network for image super-resolution. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 5791–5800

65.

Yuan Y, Fu R, Huang L, Lin W, Zhang C, Xilin C, Wang J (2021) Hrformer: high-resolution vision transformer for dense predict. In: Thirty-fifth conference on neural information processing systems (NIPS)

66.

Zeng Y, Fu J, Chao H (2020) Learning joint spatial-temporal transformations for video inpainting. In: European conference on computer vision (ECCV), pp. 528–543 https://doi.org/10.1007/978-3-030-58517-4_31)

67.

Zeng Y, Zhang P, Zhang J, Lin Z, Lu H (2019) Towards high-resolution salient object detection. In: IEEE International conference on computer vision (ICCV), pp. 7233–7242 https://doi.org/10.1109/ICCV.2019.00733

68.

Zhang Z, Cui Z, Xu C, Yan Y, Sebe N, Yang j (2019) Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 4101–4110 https://doi.org/10.1109/CVPR.2019.00423

69.

Zhao JX, Liu J, Fan D.P, Cao Y, Yang J, Cheng MM (2019) Egnet: edge guidance network for salient object detection. In: IEEE international conference on computer vision (ICCV), pp. 8778–8787 https://doi.org/10.1109/ICCV.2019.00887

70.

Zhao X, Pang Y, Zhang L, Lu H, Zhang L (2020) Suppress and balance: a simple gated network for salient object detection. In: European conference on computer vision (ECCV), pp. 35–51 https://doi.org/10.1007/978-3-030-58536-5_3

71.

Zheng S, Lu J, Zhao H, Zhu X, Zhang L (2020) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. arXiv:2012.15840

72.

Zhong Z, Zheng L, Zheng Z, Li S, Yang Y (2018) Camera style adaptation for person re-identification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 5157–5166 https://doi.org/10.1109/CVPR.2018.00541

73.

Zhou H, Xie X, Lai J.H, Chen Z, Yang L (2020) Interactive two-stream decoder for accurate and fast saliency detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 9141–9150 https://doi.org/10.1109/CVPR42600.2020.00916

74.

Zhou L, Zhou Y, Corso JJ, Socher R, Xiong C (2018) End-to-end dense video captioning with masked transformer. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 8739–8748 https://doi.org/10.1109/CVPR.2018.00911

75.

Zhu H, Sheng B, Lin X, Hao Y, Ma L (2016) Foreground object sensing for saliency detection. In: ACM on international conference on multimedia retrieval (ICMR), pp. 111–118 https://doi.org/10.1145/2911996.2912008

76.

Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2021) Deformable detr: deformable transformers for end-to-end object detection. In: International conference on learning representations (ICLR)

Title: TF-SOD: a novel transformer framework for salient object detection
Authors: Zhenyu Wang
Yunzhou Zhang
Yan Liu
Zhuo Wang
Sonya Coleman
Dermot Kerr
Publication date: 14-03-2022
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 14/2022
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-022-07069-9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 14/2022

A new hybrid algorithm based on golden eagle optimizer and grey wolf optimizer for 3D path planning of multiple UAVs in power inspection

RFIS: regression-based fuzzy inference system

AI-based user authentication reinforcement by continuous extraction of behavioral interaction features

Person Re-identification with pose variation aware data augmentation

Polynomial dendritic neural networks

Navier–stokes Generative Adversarial Network: a physics-informed deep learning model for fluid flow generation

Premium Partner