Skip to main content
Top
Published in: International Journal of Multimedia Information Retrieval 4/2022

08-11-2022 | REGULAR PAPER

FDAM: full-dimension attention module for deep convolutional neural networks

Authors: Silin Cai, Changping Wang, Jiajun Ding, Jun Yu, Jianping Fan

Published in: International Journal of Multimedia Information Retrieval | Issue 4/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The attention mechanism is an important component of cross-modal research. It can improve the performance of convolutional neural networks by distinguishing the informative parts of the feature map from the useless ones. Various kinds of attention are proposed by recent studies. Different attentions use distinct division method to weight each part of the feature map. In this paper, we propose a full-dimension attention module, which is a lightweight, fully interactive 3-D attention mechanism. FDAM generates 3-D attention maps for both spatial and channel dimensions in parallel and then multiplies them to the feature map. It is difficult to obtain discriminative attention map cell under channel interaction at a low computational cost. Therefore, we adapt a generalized Elo rating mechanism to generate cell-level attention maps. We store historical information with a slight amount of non-training parameters to spread the computation over each training iteration. The proposed module can be seamlessly integrated into the end-to-end training of the CNN framework. Experiments demonstrate that it outperforms many existing attention mechanisms on different network structures and datasets for computer vision tasks, such as image classification and object detection.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Berg A (2020) Statistical analysis of the elo rating system in chess. Chance 33(3):31–38CrossRef Berg A (2020) Statistical analysis of the elo rating system in chess. Chance 33(3):31–38CrossRef
2.
go back to reference Cao Y, Xu J, Lin S, et al (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 0–0 Cao Y, Xu J, Lin S, et al (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 0–0
3.
go back to reference Chen H, Ding G, Liu X, et al (2020) Imram: iterative matching with recurrent attention memory for cross-modal image-text retrieval. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12655–12663 Chen H, Ding G, Liu X, et al (2020) Imram: iterative matching with recurrent attention memory for cross-modal image-text retrieval. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12655–12663
4.
5.
go back to reference Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848CrossRef Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848CrossRef
6.
go back to reference Elo AE (1978) The rating of chessplayers, past and present. BT Batsford Limited, London Elo AE (1978) The rating of chessplayers, past and present. BT Batsford Limited, London
7.
go back to reference Goyal P, Dollár P, Girshick R, et al (2017) Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 Goyal P, Dollár P, Girshick R, et al (2017) Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv:​1706.​02677
9.
go back to reference Guo Y, Liu Y, Georgiou T et al (2018) A review of semantic segmentation using deep neural networks. Int J Multimedia Inf Retriev 7(2):87–93CrossRef Guo Y, Liu Y, Georgiou T et al (2018) A review of semantic segmentation using deep neural networks. Int J Multimedia Inf Retriev 7(2):87–93CrossRef
10.
go back to reference He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
11.
go back to reference He K, Zhang X, Ren S et al (2016) Identity mappings in deep residual networks. European conference on computer vision. Springer, Cham, pp 630–645 He K, Zhang X, Ren S et al (2016) Identity mappings in deep residual networks. European conference on computer vision. Springer, Cham, pp 630–645
12.
go back to reference Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 13713–13722 Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 13713–13722
13.
go back to reference Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141 Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
14.
go back to reference Huang G, Liu Z, Van Der Maaten L, et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4700–4708 Huang G, Liu Z, Van Der Maaten L, et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4700–4708
15.
go back to reference Huang G, Liu S, Van der Maaten L, et al (2018) Condensenet: an efficient densenet using learned group convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2752–2761 Huang G, Liu S, Van der Maaten L, et al (2018) Condensenet: an efficient densenet using learned group convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2752–2761
16.
go back to reference Huddar MG, Sannakki SS, Rajpurohit VS (2020) Multi-level context extraction and attention-based contextual inter-modal fusion for multimodal sentiment analysis and emotion classification. Int J Multimedia Inf Retriev 9(2):103–112CrossRef Huddar MG, Sannakki SS, Rajpurohit VS (2020) Multi-level context extraction and attention-based contextual inter-modal fusion for multimodal sentiment analysis and emotion classification. Int J Multimedia Inf Retriev 9(2):103–112CrossRef
17.
go back to reference Krizhevsky A, Hinton G, et al (2009) Learning multiple layers of features from tiny images Krizhevsky A, Hinton G, et al (2009) Learning multiple layers of features from tiny images
18.
go back to reference Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
19.
go back to reference LeCun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef LeCun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef
20.
go back to reference Lee CY, Xie S, Gallagher P, et al (2015) Deeply-supervised nets. In: Artificial intelligence and statistics. PMLR, pp 562–570 Lee CY, Xie S, Gallagher P, et al (2015) Deeply-supervised nets. In: Artificial intelligence and statistics. PMLR, pp 562–570
21.
go back to reference Li J, Guo Y, Lao S et al (2022) Few2decide: towards a robust model via using few neuron connections to decide. Int J Multimedia Inf Retriev 11(2):189–198CrossRef Li J, Guo Y, Lao S et al (2022) Few2decide: towards a robust model via using few neuron connections to decide. Int J Multimedia Inf Retriev 11(2):189–198CrossRef
22.
go back to reference Li Z, Tang J, Mei T (2018) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083 Li Z, Tang J, Mei T (2018) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083
23.
go back to reference Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. European conference on computer vision. Springer, Cham, pp 740–755 Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. European conference on computer vision. Springer, Cham, pp 740–755
24.
go back to reference Lin TY, Dollár P, Girshick R, et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125 Lin TY, Dollár P, Girshick R, et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125
25.
go back to reference Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. European conference on computer vision. Springer, Cham, pp 21–37 Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. European conference on computer vision. Springer, Cham, pp 21–37
27.
go back to reference Pelánek R (2016) Applications of the elo rating system in adaptive educational systems. Comput Educ 98:169–179CrossRef Pelánek R (2016) Applications of the elo rating system in adaptive educational systems. Comput Educ 98:169–179CrossRef
28.
go back to reference Peng Z, Li Z, Zhang J, et al (2019) Few-shot image recognition with knowledge transfer. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 441–449 Peng Z, Li Z, Zhang J, et al (2019) Few-shot image recognition with knowledge transfer. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 441–449
29.
go back to reference Qin Z, Zhang P, Wu F, et al (2021) Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 783–792 Qin Z, Zhang P, Wu F, et al (2021) Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 783–792
30.
go back to reference Ren S, He K, Girshick R, et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adva Neural Inf Process Syst 28 Ren S, He K, Girshick R, et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adva Neural Inf Process Syst 28
31.
go back to reference Sandler M, Howard A, Zhu M, et al (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4510–4520 Sandler M, Howard A, Zhu M, et al (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4510–4520
32.
go back to reference Shojaei G, Razzazi F (2019) Semi-supervised domain adaptation for pedestrian detection in video surveillance based on maximum independence assumption. Int J Multimedia Inf Retriev 8(4):241–252CrossRef Shojaei G, Razzazi F (2019) Semi-supervised domain adaptation for pedestrian detection in video surveillance based on maximum independence assumption. Int J Multimedia Inf Retriev 8(4):241–252CrossRef
33.
go back to reference Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Comput Sci Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Comput Sci
34.
go back to reference Szczecinski L, Djebbi A (2020) Understanding draws in elo rating algorithm. J Quant Anal Sports 16(3):211–220CrossRef Szczecinski L, Djebbi A (2020) Understanding draws in elo rating algorithm. J Quant Anal Sports 16(3):211–220CrossRef
35.
go back to reference Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1–9 Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1–9
36.
go back to reference Wang F, Jiang M, Qian C, et al (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3156–3164 Wang F, Jiang M, Qian C, et al (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3156–3164
37.
go back to reference Wang Q, Wu B, Zhu P, et al (2020) Eca-net: efficient channel attention for deep convolutional neural networks, 2020 ieee. In: CVF conference on computer vision and pattern recognition (CVPR). IEEE Wang Q, Wu B, Zhu P, et al (2020) Eca-net: efficient channel attention for deep convolutional neural networks, 2020 ieee. In: CVF conference on computer vision and pattern recognition (CVPR). IEEE
38.
go back to reference Wang X, Girshick R, Gupta A, et al (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7794–7803 Wang X, Girshick R, Gupta A, et al (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7794–7803
39.
go back to reference Woo S, Park J, Lee JY, et al (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV). pp 3–19 Woo S, Park J, Lee JY, et al (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV). pp 3–19
40.
go back to reference Xie J, Ma Z, Chang D, et al (2021) Gpca: a probabilistic framework for gaussian process embedded channel attention. IEEE Trans Pattern Anal Mach Intell Xie J, Ma Z, Chang D, et al (2021) Gpca: a probabilistic framework for gaussian process embedded channel attention. IEEE Trans Pattern Anal Mach Intell
41.
go back to reference Xie S, Girshick R, Dollár P, et al (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1492–1500 Xie S, Girshick R, Dollár P, et al (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1492–1500
42.
go back to reference Yang L, Zhang RY, Li L, et al (2021) Simam: a simple, parameter-free attention module for convolutional neural networks. In: International conference on machine learning. PMLR, pp 11863–11874 Yang L, Zhang RY, Li L, et al (2021) Simam: a simple, parameter-free attention module for convolutional neural networks. In: International conference on machine learning. PMLR, pp 11863–11874
43.
go back to reference Yang Z, Zhu L, Wu Y, et al (2020) Gated channel transformation for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 11794–11803 Yang Z, Zhu L, Wu Y, et al (2020) Gated channel transformation for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 11794–11803
44.
go back to reference Yuan Y, Chen X, Wang J (2020) Object-contextual representations for semantic segmentation. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16. Springer, pp 173–190 Yuan Y, Chen X, Wang J (2020) Object-contextual representations for semantic segmentation. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16. Springer, pp 173–190
46.
go back to reference Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. European conference on computer vision. Springer, Cham, pp 818–833 Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. European conference on computer vision. Springer, Cham, pp 818–833
Metadata
Title
FDAM: full-dimension attention module for deep convolutional neural networks
Authors
Silin Cai
Changping Wang
Jiajun Ding
Jun Yu
Jianping Fan
Publication date
08-11-2022
Publisher
Springer London
Published in
International Journal of Multimedia Information Retrieval / Issue 4/2022
Print ISSN: 2192-6611
Electronic ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-022-00248-3

Other articles of this Issue 4/2022

International Journal of Multimedia Information Retrieval 4/2022 Go to the issue

Premium Partner