Skip to main content
Erschienen in: International Journal of Computer Vision 4/2020

28.01.2020

A Simple and Light-Weight Attention Module for Convolutional Neural Networks

verfasst von: Jongchan Park, Sanghyun Woo, Joon-Young Lee, In So Kweon

Erschienen in: International Journal of Computer Vision | Ausgabe 4/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many aspects of deep neural networks, such as depth, width, or cardinality, have been studied to strengthen the representational power. In this work, we study the effect of attention in convolutional neural networks and present our idea in a simple self-contained module, called Bottleneck Attention Module (BAM). Given an intermediate feature map, BAM efficiently produces the attention map along two factorized axes, channel and spatial, with negligible overheads. BAM is placed at bottlenecks of various models where the downsampling of feature maps occurs, and is jointly trained in an end-to-end manner. Ablation studies and extensive experiments are conducted in CIFAR-100/ImageNet classification, VOC2007/MS-COCO detection, super resolution and scene parsing with various architectures including mobile-oriented networks. BAM shows consistent improvements over all experiments, demonstrating the wide applicability of BAM. The code and models are available at https://​github.​com/​Jongchan/​attentionmodule.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ba, J., Mnih, V., & Kavukcuoglu, K. (2015). Multiple object recognition with visual attention. In Proceedings of international conference on learning representations (ICLR). Ba, J., Mnih, V., & Kavukcuoglu, K. (2015). Multiple object recognition with visual attention. In Proceedings of international conference on learning representations (ICLR).
Zurück zum Zitat Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:​1409.​0473.
Zurück zum Zitat Bell, S., Lawrence Zitnick, C., Bala, K., & Girshick, R. (2016). Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of computer vision and pattern recognition (CVPR). Bell, S., Lawrence Zitnick, C., Bala, K., & Girshick, R. (2016). Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of computer vision and pattern recognition (CVPR).
Zurück zum Zitat Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., & Chua, T. S. (2017a) Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of computer vision and pattern recognition (CVPR). Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., & Chua, T. S. (2017a) Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of computer vision and pattern recognition (CVPR).
Zurück zum Zitat Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., & Chua, T. S. (2017b) Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5659–5667). Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., & Chua, T. S. (2017b) Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5659–5667).
Zurück zum Zitat Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915. Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:​1606.​00915.
Zurück zum Zitat Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of computer vision and pattern recognition (CVPR). Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of computer vision and pattern recognition (CVPR).
Zurück zum Zitat Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience, 3, 3.CrossRef Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience, 3, 3.CrossRef
Zurück zum Zitat Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y. (2017). Deformable convolutional networks. CoRR, abs/170306211 1(2):3. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y. (2017). Deformable convolutional networks. CoRR, abs/170306211 1(2):3.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of computer vision and pattern recognition (CVPR). Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of computer vision and pattern recognition (CVPR).
Zurück zum Zitat Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680). Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680).
Zurück zum Zitat Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015). Draw: A recurrent neural network for image generation. In Proceedings of international conference on machine learning (ICML). Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015). Draw: A recurrent neural network for image generation. In Proceedings of international conference on machine learning (ICML).
Zurück zum Zitat Han, D., Kim, J., & Kim, J. (2017). Deep pyramidal residual networks. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6307–6315). IEEE. Han, D., Kim, J., & Kim, J. (2017). Deep pyramidal residual networks. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6307–6315). IEEE.
Zurück zum Zitat Hariharan, B., Arbeláez, P., Girshick, R., & Malik, J. (2015). Hypercolumns for object segmentation and fine-grained localization. In Proceedings of computer vision and pattern recognition (CVPR). Hariharan, B., Arbeláez, P., Girshick, R., & Malik, J. (2015). Hypercolumns for object segmentation and fine-grained localization. In Proceedings of computer vision and pattern recognition (CVPR).
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J. (2016a). Deep residual learning for image recognition. In Proceedings of computer vision and pattern recognition (CVPR). He, K., Zhang, X., Ren, S., Sun, J. (2016a). Deep residual learning for image recognition. In Proceedings of computer vision and pattern recognition (CVPR).
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2016b). Identity mappings in deep residual networks. In Proceedings of European conference on computer vision (ECCV). He, K., Zhang, X., Ren, S., & Sun, J. (2016b). Identity mappings in deep residual networks. In Proceedings of European conference on computer vision (ECCV).
Zurück zum Zitat Hirsch, J., & Curcio, C. A. (1989). The spatial resolution capacity of human foveal retina. Vision Research, 29(9), 1095–1101.CrossRef Hirsch, J., & Curcio, C. A. (1989). The spatial resolution capacity of human foveal retina. Vision Research, 29(9), 1095–1101.CrossRef
Zurück zum Zitat Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:​1704.​04861.
Zurück zum Zitat Hu, J., Shen, L., Albanie, S., Sun, G., & Vedaldi, A. (2018a). Gather-excite: Exploiting feature context in convolutional neural networks. In Advances in neural information processing systems (pp. 9422–9432). Hu, J., Shen, L., Albanie, S., Sun, G., & Vedaldi, A. (2018a). Gather-excite: Exploiting feature context in convolutional neural networks. In Advances in neural information processing systems (pp. 9422–9432).
Zurück zum Zitat Hu, J., Shen, L., & Sun, G. (2018b). Squeeze-and-excitation networks. In Proceedings of computer vision and pattern recognition (CVPR). Hu, J., Shen, L., & Sun, G. (2018b). Squeeze-and-excitation networks. In Proceedings of computer vision and pattern recognition (CVPR).
Zurück zum Zitat Huang, G., Liu, Z., Weinberger, K. Q., & van der Maaten, L. (2017). Densely connected convolutional networks. In Proceedings of computer vision and pattern recognition (CVPR). Huang, G., Liu, Z., Weinberger, K. Q., & van der Maaten, L. (2017). Densely connected convolutional networks. In Proceedings of computer vision and pattern recognition (CVPR).
Zurück zum Zitat Huang, G., Sun, Y., Liu, Z., Sedra, D., & Weinberger, K. Q. (2016). Deep networks with stochastic depth. In Proceedings of European conference on computer vision (ECCV). Huang, G., Sun, Y., Liu, Z., Sedra, D., & Weinberger, K. Q. (2016). Deep networks with stochastic depth. In Proceedings of European conference on computer vision (ECCV).
Zurück zum Zitat Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology, 148(3), 574–591.CrossRef Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology, 148(3), 574–591.CrossRef
Zurück zum Zitat Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5mb model size. arXiv preprint arXiv:1602.07360. Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5mb model size. arXiv preprint arXiv:​1602.​07360.
Zurück zum Zitat Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of international conference on machine learning (ICML). Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of international conference on machine learning (ICML).
Zurück zum Zitat Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. In IEEE transactions on pattern analysis machine intelligence (TPAMI). Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. In IEEE transactions on pattern analysis machine intelligence (TPAMI).
Zurück zum Zitat Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015a). Spatial transformer networks. In Proceedings of neural information processing systems (NIPS). Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015a). Spatial transformer networks. In Proceedings of neural information processing systems (NIPS).
Zurück zum Zitat Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015b). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017–2025). Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015b). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017–2025).
Zurück zum Zitat Jia, X., De Brabandere, B., Tuytelaars, T., & Gool, L. V. (2016). Dynamic filter networks. In Advances in neural information processing systems (pp. 667–675). Jia, X., De Brabandere, B., Tuytelaars, T., & Gool, L. V. (2016). Dynamic filter networks. In Advances in neural information processing systems (pp. 667–675).
Zurück zum Zitat Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Technical report, University of Toronto. Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Technical report, University of Toronto.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of neural information processing systems (NIPS). Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of neural information processing systems (NIPS).
Zurück zum Zitat Larochelle, H., & Hinton, G. E. (2010). Learning to combine foveal glimpses with a third-order Boltzmann machine. In Proceedings of neural information processing systems (NIPS). Larochelle, H., & Hinton, G. E. (2010). Learning to combine foveal glimpses with a third-order Boltzmann machine. In Proceedings of neural information processing systems (NIPS).
Zurück zum Zitat Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A. P., Tejani, A., Totz, J., Wang, Z., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of computer vision and pattern recognition (CVPR). Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A. P., Tejani, A., Totz, J., Wang, Z., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of computer vision and pattern recognition (CVPR).
Zurück zum Zitat Li, W., Zhu, X., & Gong, S. (2018). Harmonious attention network for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2285–2294). Li, W., Zhu, X., & Gong, S. (2018). Harmonious attention network for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2285–2294).
Zurück zum Zitat Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Proceedings of European conference on computer vision (ECCV). Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Proceedings of European conference on computer vision (ECCV).
Zurück zum Zitat Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Proceedings of European conference on computer vision (ECCV). Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Proceedings of European conference on computer vision (ECCV).
Zurück zum Zitat Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of computer vision and pattern recognition (CVPR). Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of computer vision and pattern recognition (CVPR).
Zurück zum Zitat Marr, D., & Vision, A. (1982). A computational investigation into the human representation and processing of visual information (Vol. 1, No. 2). WH San Francisco: Freeman and Company. Marr, D., & Vision, A. (1982). A computational investigation into the human representation and processing of visual information (Vol. 1, No. 2). WH San Francisco: Freeman and Company.
Zurück zum Zitat Mnih, V., Heess, N., Graves, A., et al. (2014). Recurrent models of visual attention. Advances in neural information processing systems. In Proceedings of neural information processing systems (NIPS). Mnih, V., Heess, N., Graves, A., et al. (2014). Recurrent models of visual attention. Advances in neural information processing systems. In Proceedings of neural information processing systems (NIPS).
Zurück zum Zitat Morcos, A. S., Barrett, D. G., Rabinowitz, N. C., & Botvinick, M. (2018) On the importance of single directions for generalization. In Proceedings of international conference on learning representations (ICLR). Morcos, A. S., Barrett, D. G., Rabinowitz, N. C., & Botvinick, M. (2018) On the importance of single directions for generalization. In Proceedings of international conference on learning representations (ICLR).
Zurück zum Zitat Nam, H., Ha, J. W., & Kim, J. (2017). Dual attention networks for multimodal reasoning and matching. In Proceedings of computer vision and pattern recognition (CVPR) (pp. 2156–2164). Nam, H., Ha, J. W., & Kim, J. (2017). Dual attention networks for multimodal reasoning and matching. In Proceedings of computer vision and pattern recognition (CVPR) (pp. 2156–2164).
Zurück zum Zitat Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of neural information processing systems (NIPS). Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of neural information processing systems (NIPS).
Zurück zum Zitat Rensink, R. A. (2000). The dynamic representation of scenes. Visual Cognition, 7(1–3), 17–42.CrossRef Rensink, R. A. (2000). The dynamic representation of scenes. Visual Cognition, 7(1–3), 17–42.CrossRef
Zurück zum Zitat Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019–1025.CrossRef Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019–1025.CrossRef
Zurück zum Zitat Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510–4520). Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510–4520).
Zurück zum Zitat Simon, M., & Rodner, E. (2015). Neural activation constellations: Unsupervised part model discovery with convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 1143–1151). Simon, M., & Rodner, E. (2015). Neural activation constellations: Unsupervised part model discovery with convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 1143–1151).
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of international conference on learning representations (ICLR). Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of international conference on learning representations (ICLR).
Zurück zum Zitat Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of computer vision and pattern recognition (CVPR). Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of computer vision and pattern recognition (CVPR).
Zurück zum Zitat Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., & Tang, X. (2017). Residual attention network for image classification. In Proceedings of computer vision and pattern recognition (CVPR). Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., & Tang, X. (2017). Residual attention network for image classification. In Proceedings of computer vision and pattern recognition (CVPR).
Zurück zum Zitat Woo, S., Hwang, S., & Kweon, I. S. (2018a) Stairnet: Top-down semantic aggregation for accurate one shot detection. In Proceedings of winter conference on applications of computer vision (WACV). Woo, S., Hwang, S., & Kweon, I. S. (2018a) Stairnet: Top-down semantic aggregation for accurate one shot detection. In Proceedings of winter conference on applications of computer vision (WACV).
Zurück zum Zitat Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018b) Cbam: Convolutional block attention module. In Proceedings of European conference on computer vision (ECCV). Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018b) Cbam: Convolutional block attention module. In Proceedings of European conference on computer vision (ECCV).
Zurück zum Zitat Xiao, T., Liu, Y., Zhou, B., Jiang, Y., & Sun, J. (2018). Unified perceptual parsing for scene understanding. In Proceedings of European conference on computer vision (ECCV). Xiao, T., Liu, Y., Zhou, B., Jiang, Y., & Sun, J. (2018). Unified perceptual parsing for scene understanding. In Proceedings of European conference on computer vision (ECCV).
Zurück zum Zitat Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of computer vision and pattern recognition (CVPR). Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of computer vision and pattern recognition (CVPR).
Zurück zum Zitat Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of international conference on machine learning (ICML). Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of international conference on machine learning (ICML).
Zurück zum Zitat Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2016). Stacked attention networks for image question answering. In Proceedings of computer vision and pattern recognition (CVPR). Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2016). Stacked attention networks for image question answering. In Proceedings of computer vision and pattern recognition (CVPR).
Zurück zum Zitat Yu, F, & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. In Proceedings of international conference on learning representations (ICLR). Yu, F, & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. In Proceedings of international conference on learning representations (ICLR).
Zurück zum Zitat Zagoruyko, S, & Komodakis, N. (2016). Wide residual networks. In Proceedings of British machine vision conference (BMVC). Zagoruyko, S, & Komodakis, N. (2016). Wide residual networks. In Proceedings of British machine vision conference (BMVC).
Zurück zum Zitat Zhang, X., Xiong, H., Zhou, W., Lin, W., & Tian, Q. (2016). Picking deep filter responses for fine-grained image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1134–1142). Zhang, X., Xiong, H., Zhou, W., Lin, W., & Tian, Q. (2016). Picking deep filter responses for fine-grained image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1134–1142).
Zurück zum Zitat Zhu, Y., Zhao, C., Wang, J., Zhao, X., Wu, Y., & Lu, H. (2017). Couplenet: Coupling global structure with local parts for object detection. In Proceedings of international conference on computer vision (ICCV). Zhu, Y., Zhao, C., Wang, J., Zhao, X., Wu, Y., & Lu, H. (2017). Couplenet: Coupling global structure with local parts for object detection. In Proceedings of international conference on computer vision (ICCV).
Metadaten
Titel
A Simple and Light-Weight Attention Module for Convolutional Neural Networks
verfasst von
Jongchan Park
Sanghyun Woo
Joon-Young Lee
In So Kweon
Publikationsdatum
28.01.2020
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 4/2020
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-019-01283-0

Weitere Artikel der Ausgabe 4/2020

International Journal of Computer Vision 4/2020 Zur Ausgabe