nach oben

International Journal of Computer Vision

Erschienen in:

28.01.2020

A Simple and Light-Weight Attention Module for Convolutional Neural Networks

verfasst von: Jongchan Park, Sanghyun Woo, Joon-Young Lee, In So Kweon

Erschienen in: International Journal of Computer Vision | Ausgabe 4/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Many aspects of deep neural networks, such as depth, width, or cardinality, have been studied to strengthen the representational power. In this work, we study the effect of attention in convolutional neural networks and present our idea in a simple self-contained module, called Bottleneck Attention Module (BAM). Given an intermediate feature map, BAM efficiently produces the attention map along two factorized axes, channel and spatial, with negligible overheads. BAM is placed at bottlenecks of various models where the downsampling of feature maps occurs, and is jointly trained in an end-to-end manner. Ablation studies and extensive experiments are conducted in CIFAR-100/ImageNet classification, VOC2007/MS-COCO detection, super resolution and scene parsing with various architectures including mobile-oriented networks. BAM shows consistent improvements over all experiments, demonstrating the wide applicability of BAM. The code and models are available at https://github.com/Jongchan/attentionmodule.

Vorheriger Artikel Learning on the Edge: Investigating Boundary Filters in CNNs

Nächster Artikel Simultaneous Deep Stereo Matching and Dehazing with Feature Attention

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

https://pytorch.org/.

Ba, J., Mnih, V., & Kavukcuoglu, K. (2015). Multiple object recognition with visual attention. In Proceedings of international conference on learning representations (ICLR).

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

Bell, S., Lawrence Zitnick, C., Bala, K., & Girshick, R. (2016). Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of computer vision and pattern recognition (CVPR).

Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., & Chua, T. S. (2017a) Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of computer vision and pattern recognition (CVPR).

Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., & Chua, T. S. (2017b) Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5659–5667).

Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915.

Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of computer vision and pattern recognition (CVPR).

Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience, 3, 3.CrossRef

Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y. (2017). Deformable convolutional networks. CoRR, abs/170306211 1(2):3.

Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of computer vision and pattern recognition (CVPR).

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680).

Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015). Draw: A recurrent neural network for image generation. In Proceedings of international conference on machine learning (ICML).

Han, D., Kim, J., & Kim, J. (2017). Deep pyramidal residual networks. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6307–6315). IEEE.

Hariharan, B., Arbeláez, P., Girshick, R., & Malik, J. (2015). Hypercolumns for object segmentation and fine-grained localization. In Proceedings of computer vision and pattern recognition (CVPR).

He, K., Zhang, X., Ren, S., Sun, J. (2016a). Deep residual learning for image recognition. In Proceedings of computer vision and pattern recognition (CVPR).

He, K., Zhang, X., Ren, S., & Sun, J. (2016b). Identity mappings in deep residual networks. In Proceedings of European conference on computer vision (ECCV).

Hirsch, J., & Curcio, C. A. (1989). The spatial resolution capacity of human foveal retina. Vision Research, 29(9), 1095–1101.CrossRef

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.

Hu, J., Shen, L., Albanie, S., Sun, G., & Vedaldi, A. (2018a). Gather-excite: Exploiting feature context in convolutional neural networks. In Advances in neural information processing systems (pp. 9422–9432).

Hu, J., Shen, L., & Sun, G. (2018b). Squeeze-and-excitation networks. In Proceedings of computer vision and pattern recognition (CVPR).

Huang, G., Liu, Z., Weinberger, K. Q., & van der Maaten, L. (2017). Densely connected convolutional networks. In Proceedings of computer vision and pattern recognition (CVPR).

Huang, G., Sun, Y., Liu, Z., Sedra, D., & Weinberger, K. Q. (2016). Deep networks with stochastic depth. In Proceedings of European conference on computer vision (ECCV).

Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology, 148(3), 574–591.CrossRef

Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5mb model size. arXiv preprint arXiv:1602.07360.

Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of international conference on machine learning (ICML).

Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. In IEEE transactions on pattern analysis machine intelligence (TPAMI).

Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015a). Spatial transformer networks. In Proceedings of neural information processing systems (NIPS).

Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015b). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017–2025).

Jia, X., De Brabandere, B., Tuytelaars, T., & Gool, L. V. (2016). Dynamic filter networks. In Advances in neural information processing systems (pp. 667–675).

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Technical report, University of Toronto.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of neural information processing systems (NIPS).

Larochelle, H., & Hinton, G. E. (2010). Learning to combine foveal glimpses with a third-order Boltzmann machine. In Proceedings of neural information processing systems (NIPS).

Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A. P., Tejani, A., Totz, J., Wang, Z., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of computer vision and pattern recognition (CVPR).

Li, W., Zhu, X., & Gong, S. (2018). Harmonious attention network for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2285–2294).

Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Proceedings of European conference on computer vision (ECCV).

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Proceedings of European conference on computer vision (ECCV).

Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of computer vision and pattern recognition (CVPR).

Marr, D., & Vision, A. (1982). A computational investigation into the human representation and processing of visual information (Vol. 1, No. 2). WH San Francisco: Freeman and Company.

Mnih, V., Heess, N., Graves, A., et al. (2014). Recurrent models of visual attention. Advances in neural information processing systems. In Proceedings of neural information processing systems (NIPS).

Morcos, A. S., Barrett, D. G., Rabinowitz, N. C., & Botvinick, M. (2018) On the importance of single directions for generalization. In Proceedings of international conference on learning representations (ICLR).

Nam, H., Ha, J. W., & Kim, J. (2017). Dual attention networks for multimodal reasoning and matching. In Proceedings of computer vision and pattern recognition (CVPR) (pp. 2156–2164).

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of neural information processing systems (NIPS).

Rensink, R. A. (2000). The dynamic representation of scenes. Visual Cognition, 7(1–3), 17–42.CrossRef

Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019–1025.CrossRef

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510–4520).

Simon, M., & Rodner, E. (2015). Neural activation constellations: Unsupervised part model discovery with convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 1143–1151).

Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of international conference on learning representations (ICLR).

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of computer vision and pattern recognition (CVPR).

Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., & Tang, X. (2017). Residual attention network for image classification. In Proceedings of computer vision and pattern recognition (CVPR).

Woo, S., Hwang, S., & Kweon, I. S. (2018a) Stairnet: Top-down semantic aggregation for accurate one shot detection. In Proceedings of winter conference on applications of computer vision (WACV).

Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018b) Cbam: Convolutional block attention module. In Proceedings of European conference on computer vision (ECCV).

Xiao, T., Liu, Y., Zhou, B., Jiang, Y., & Sun, J. (2018). Unified perceptual parsing for scene understanding. In Proceedings of European conference on computer vision (ECCV).

Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of computer vision and pattern recognition (CVPR).

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of international conference on machine learning (ICML).

Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2016). Stacked attention networks for image question answering. In Proceedings of computer vision and pattern recognition (CVPR).

Yu, F, & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. In Proceedings of international conference on learning representations (ICLR).

Zagoruyko, S, & Komodakis, N. (2016). Wide residual networks. In Proceedings of British machine vision conference (BMVC).

Zeiler, M. D. (2012) Adadelta: An adaptive learning rate method. arXiv preprint arXiv:1212.5701.

Zhang, X., Xiong, H., Zhou, W., Lin, W., & Tian, Q. (2016). Picking deep filter responses for fine-grained image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1134–1142).

Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., & Torralba, A. (2018). Semantic segmentation on MIT ADE20K dataset in PyTorch. https://github.com/CSAILVision/semantic-segmentation-pytorch/.

Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., & Torralba, A. (2019). Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision (IJCV), 127, 302–321. https://doi.org/10.1007/s11263-018-1140-0. CrossRef

Zhu, Y., Zhao, C., Wang, J., Zhao, X., Wu, Y., & Lu, H. (2017). Couplenet: Coupling global structure with local parts for object detection. In Proceedings of international conference on computer vision (ICCV).

Titel: A Simple and Light-Weight Attention Module for Convolutional Neural Networks
verfasst von: Jongchan Park
Sanghyun Woo
Joon-Young Lee
In So Kweon
Publikationsdatum: 28.01.2020
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 4/2020
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-019-01283-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 4/2020

End-to-End Learning of Decision Trees and Forests

Learning Multi-human Optical Flow

Simultaneous Deep Stereo Matching and Dehazing with Feature Attention

Inference, Learning and Attention Mechanisms that Exploit and Preserve Sparsity in CNNs

KS(conf): A Light-Weight Test if a Multiclass Classifier Operates Outside of Its Specifications

EdgeStereo: An Effective Multi-task Learning Network for Stereo Matching and Edge Detection