Skip to main content
Top
Published in: International Journal of Computer Vision 2/2020

11-10-2019

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

Authors: Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra

Published in: International Journal of Computer Vision | Issue 2/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We propose a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable. Our approach—Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say ‘dog’ in a classification network or a sequence of words in captioning network) flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Unlike previous approaches, Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers (e.g.VGG), (2) CNNs used for structured outputs (e.g.captioning), (3) CNNs used in tasks with multi-modal inputs (e.g.visual question answering) or reinforcement learning, all without architectural changes or re-training. We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative visualization, Guided Grad-CAM, and apply it to image classification, image captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into failure modes of these models (showing that seemingly unreasonable predictions have reasonable explanations), (b) outperform previous methods on the ILSVRC-15 weakly-supervised localization task, (c) are robust to adversarial perturbations, (d) are more faithful to the underlying model, and (e) help achieve model generalization by identifying dataset bias. For image captioning and VQA, our visualizations show that even non-attention based models learn to localize discriminative regions of input image. We devise a way to identify important neurons through Grad-CAM and combine it with neuron names (Bau et al. in Computer vision and pattern recognition, 2017) to provide textual explanations for model decisions. Finally, we design and conduct human studies to measure if Grad-CAM explanations help users establish appropriate trust in predictions from deep networks and show that Grad-CAM helps untrained users successfully discern a ‘stronger’ deep network from a ‘weaker’ one even when both make identical predictions. Our code is available at https://​github.​com/​ramprs/​grad-cam/​, along with a demo on CloudCV (Agrawal et al., in: Mobile cloud visual media computing, pp 265–290. Springer, 2015) (http://​gradcam.​cloudcv.​org) and a video at http://​youtu.​be/​COjUB9Izk6E.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Empirically we found global-average-pooling to work better than global-max-pooling as can be found in the “Appendix”.
 
2
We find that Grad-CAM maps become progressively worse as we move to earlier convolutional layers as they have smaller receptive fields and only focus on less semantic local features.
 
3
We use GoogLeNet finetuned on COCO, as provided by Zhang et al. (2016).
 
4
c-MWP (Zhang et al. 2016) highlights arbitrary regions for predicted but non-existent categories, unlike Grad-CAM maps which typically do not.
 
5
The green and red boxes are drawn manually to highlight correct and incorrect focus of the model.
 
6
Area of overlap between ground truth concept annotation and neuron activation over area of their union. More details of this metric can be found in Bau et al. (2017).
 
Literature
go back to reference Agrawal, A., Batra, D., & Parikh, D. (2016). Analyzing the behavior of visual question answering models. In EMNLP. Agrawal, A., Batra, D., & Parikh, D. (2016). Analyzing the behavior of visual question answering models. In EMNLP.
go back to reference Agrawal, H., Mathialagan, C. S., Goyal, Y., Chavali, N., Banik, P., Mohapatra, A., Osman, A., & Batra, D. (2015). CloudCV: Large scale distributed computer vision as a cloud service. In Mobile cloud visual media computing (pp. 265–290). Springer. Agrawal, H., Mathialagan, C. S., Goyal, Y., Chavali, N., Banik, P., Mohapatra, A., Osman, A., & Batra, D. (2015). CloudCV: Large scale distributed computer vision as a cloud service. In Mobile cloud visual media computing (pp. 265–290). Springer.
go back to reference Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Lawrence Zitnick, C., & Parikh, D. (2015). VQA: Visual question answering. In ICCV. Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Lawrence Zitnick, C., & Parikh, D. (2015). VQA: Visual question answering. In ICCV.
go back to reference Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Network dissection: Quantifying interpretability of deep visual representations. In computer vision and pattern recognition. Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Network dissection: Quantifying interpretability of deep visual representations. In computer vision and pattern recognition.
go back to reference Bazzani, L., Bergamo, A., Anguelov, D., Torresani, L. (2016). Self-taught object localization with deep networks. In WACV. Bazzani, L., Bergamo, A., Anguelov, D., Torresani, L. (2016). Self-taught object localization with deep networks. In WACV.
go back to reference Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.CrossRef Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.CrossRef
go back to reference Chen, X., Fang, H., Lin, T.-Y., Vedantam, R., Gupta, S., Dollár, P., & Zitnick, C. L. (2015). Microsoft COCO captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325. Chen, X., Fang, H., Lin, T.-Y., Vedantam, R., Gupta, S., Dollár, P., & Zitnick, C. L. (2015). Microsoft COCO captions: Data collection and evaluation server. arXiv preprint arXiv:​1504.​00325.
go back to reference Cinbis, R. G., Verbeek, J., & Schmid, C. (2016). Weakly supervised object localization with multi-fold multiple instance learning. IEEE transactions on pattern analysis and machine intelligence. Cinbis, R. G., Verbeek, J., & Schmid, C. (2016). Weakly supervised object localization with multi-fold multiple instance learning. IEEE transactions on pattern analysis and machine intelligence.
go back to reference Das, A., Agrawal, H., Zitnick, C. L., Parikh, D., & Batra, D. (2016). Human attention in visual question answering: Do humans and deep networks look at the same regions? In EMNLP. Das, A., Agrawal, H., Zitnick, C. L., Parikh, D., & Batra, D. (2016). Human attention in visual question answering: Do humans and deep networks look at the same regions? In EMNLP.
go back to reference Das, A., Datta, S., Gkioxari, G., Lee, S., Parikh, D., & Batra, D. (2018). Embodied question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Das, A., Datta, S., Gkioxari, G., Lee, S., Parikh, D., & Batra, D. (2018). Embodied question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
go back to reference Das, A., Kottur, S., Gupta, K., Singh, A., Yadav, D., Moura, J. M., Parikh, D., & Batra, D. (2017a). Visual dialog. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Das, A., Kottur, S., Gupta, K., Singh, A., Yadav, D., Moura, J. M., Parikh, D., & Batra, D. (2017a). Visual dialog. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
go back to reference Das, A., Kottur, S., Moura, J. M., Lee, S., & Batra, D. (2017b). Learning cooperative visual dialog agents with deep reinforcement learning. In Proceedings of the IEEE international conference on computer vision (ICCV). Das, A., Kottur, S., Moura, J. M., Lee, S., & Batra, D. (2017b). Learning cooperative visual dialog agents with deep reinforcement learning. In Proceedings of the IEEE international conference on computer vision (ICCV).
go back to reference de Vries, H., Strub, F., Chandar, S., Pietquin, O., Larochelle, H., & Courville, A. C. (2017). Guesswhat?! visual object discovery through multi-modal dialogue. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). de Vries, H., Strub, F., Chandar, S., Pietquin, O., Larochelle, H., & Courville, A. C. (2017). Guesswhat?! visual object discovery through multi-modal dialogue. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
go back to reference Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In CVPR. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In CVPR.
go back to reference Dosovitskiy, A., & Brox, T. (2015). Inverting convolutional networks with convolutional networks. In CVPR. Dosovitskiy, A., & Brox, T. (2015). Inverting convolutional networks with convolutional networks. In CVPR.
go back to reference Erhan, D., Bengio, Y., Courville, A., & Vincent, P. (2009). Visualizing higher-layer features of a deep network. University of Montreal, 1341. Erhan, D., Bengio, Y., Courville, A., & Vincent, P. (2009). Visualizing higher-layer features of a deep network. University of Montreal, 1341.
go back to reference Fang, H., Gupta, S., Iandola, F., Srivastava, R. K., Deng, L., Dollár, P., Gao, J., He, X., Mitchell, M., Platt, J. C., et al. (2015). From captions to visual concepts and back. In CVPR. Fang, H., Gupta, S., Iandola, F., Srivastava, R. K., Deng, L., Dollár, P., Gao, J., He, X., Mitchell, M., Platt, J. C., et al. (2015). From captions to visual concepts and back. In CVPR.
go back to reference Gan, C., Wang, N., Yang, Y., Yeung, D.-Y., & Hauptmann, A. G. (2015). Devnet: A deep event network for multimedia event detection and evidence recounting. In CVPR. Gan, C., Wang, N., Yang, Y., Yeung, D.-Y., & Hauptmann, A. G. (2015). Devnet: A deep event network for multimedia event detection and evidence recounting. In CVPR.
go back to reference Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., & Xu, W. (2015). Are you talking to a machine? In NIPS: Dataset and methods for multilingual image question answering. Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., & Xu, W. (2015). Are you talking to a machine? In NIPS: Dataset and methods for multilingual image question answering.
go back to reference Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR.
go back to reference Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. stat. Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. stat.
go back to reference Gordon, D., Kembhavi, A., Rastegari, M., Redmon, J., Fox, D., & Farhadi, A. (2017). Iqa: Visual question answering in interactive environments. arXiv preprint arXiv:1712.03316. Gordon, D., Kembhavi, A., Rastegari, M., Redmon, J., Fox, D., & Farhadi, A. (2017). Iqa: Visual question answering in interactive environments. arXiv preprint arXiv:​1712.​03316.
go back to reference He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.
go back to reference Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In ECCV. Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In ECCV.
go back to reference Jackson, P. (1998). Introduction to expert systems (3rd ed.). Boston, MA: Addison-Wesley Longman Publishing Co., Inc,MATH Jackson, P. (1998). Introduction to expert systems (3rd ed.). Boston, MA: Addison-Wesley Longman Publishing Co., Inc,MATH
go back to reference Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACM MM. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACM MM.
go back to reference Johns, E., Mac Aodha, O., & Brostow, G. J. (2015). Becoming the expert—interactive multi-class machine teaching. In CVPR. Johns, E., Mac Aodha, O., & Brostow, G. J. (2015). Becoming the expert—interactive multi-class machine teaching. In CVPR.
go back to reference Johnson, J., Karpathy, A., & Fei-Fei, L. (2016). DenseCap: Fully convolutional localization networks for dense captioning. In CVPR. Johnson, J., Karpathy, A., & Fei-Fei, L. (2016). DenseCap: Fully convolutional localization networks for dense captioning. In CVPR.
go back to reference Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In CVPR. Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In CVPR.
go back to reference Kolesnikov, A., & Lampert, C. H. (2016). Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In ECCV. Kolesnikov, A., & Lampert, C. H. (2016). Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In ECCV.
go back to reference Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.
go back to reference Lin, M., Chen, Q., & Yan, S. (2014a). Network in network. In ICLR. Lin, M., Chen, Q., & Yan, S. (2014a). Network in network. In ICLR.
go back to reference Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014b). Microsoft coco: Common objects in context. In ECCV. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014b). Microsoft coco: Common objects in context. In ECCV.
go back to reference Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR.
go back to reference Lu, J., Yang, J., Batra, D., & Parikh, D. (2016). Hierarchical question-image co-attention for visual question answering. In NIPS. Lu, J., Yang, J., Batra, D., & Parikh, D. (2016). Hierarchical question-image co-attention for visual question answering. In NIPS.
go back to reference Mahendran, A., & Vedaldi, A. (2016a). Salient deconvolutional networks. In European conference on computer vision. Mahendran, A., & Vedaldi, A. (2016a). Salient deconvolutional networks. In European conference on computer vision.
go back to reference Mahendran, A., & Vedaldi, A. (2016b). Visualizing deep convolutional neural networks using natural pre-images. International Journal of Computer Vision, 1–23. Mahendran, A., & Vedaldi, A. (2016b). Visualizing deep convolutional neural networks using natural pre-images. International Journal of Computer Vision, 1–23.
go back to reference Malinowski, M., Rohrbach, M., & Fritz, M. (2015). Ask your neurons: A neural-based approach to answering questions about images. In ICCV. Malinowski, M., Rohrbach, M., & Fritz, M. (2015). Ask your neurons: A neural-based approach to answering questions about images. In ICCV.
go back to reference Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In CVPR. Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In CVPR.
go back to reference Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2015). Is object localization for free?—weakly-supervised learning with convolutional neural networks. In CVPR. Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2015). Is object localization for free?—weakly-supervised learning with convolutional neural networks. In CVPR.
go back to reference Pinheiro, P. O., & Collobert, R. (2015). From image-level to pixel-level labeling with convolutional networks. In CVPR. Pinheiro, P. O., & Collobert, R. (2015). From image-level to pixel-level labeling with convolutional networks. In CVPR.
go back to reference Ren, M., Kiros, R., & Zemel, R. (2015). Exploring models and data for image question answering. In NIPS. Ren, M., Kiros, R., & Zemel, R. (2015). Exploring models and data for image question answering. In NIPS.
go back to reference Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should i trust you?” Explaining the predictions of any classifier. In SIGKDD. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should i trust you?” Explaining the predictions of any classifier. In SIGKDD.
go back to reference Selvaraju, R. R., Chattopadhyay, P., Elhoseiny, M., Sharma, T., Batra, D., Parikh, D., & Lee, S. (2018). Choose your neuron: Incorporating domain knowledge through neuron-importance. In Proceedings of the European conference on computer vision (ECCV) (pp. 526–541). Selvaraju, R. R., Chattopadhyay, P., Elhoseiny, M., Sharma, T., Batra, D., Parikh, D., & Lee, S. (2018). Choose your neuron: Incorporating domain knowledge through neuron-importance. In Proceedings of the European conference on computer vision (ECCV) (pp. 526–541).
go back to reference Selvaraju, R. R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., & Batra, D. (2016). Grad-CAM: Why did you say that? Visual explanations from deep networks via gradient-based localization. CoRR. arXiv:1610.02391 Selvaraju, R. R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., & Batra, D. (2016). Grad-CAM: Why did you say that? Visual explanations from deep networks via gradient-based localization. CoRR. arXiv:​1610.​02391
go back to reference Selvaraju, R. R., Lee, S., Shen, Y., Jin, H., Ghosh, S., Heck, L., Batra, D., & Parikh, D. (2019) Taking a hint: Leveraging explanations to make vision and language models more grounded. In Proceedings of the international conference on computer vision (ICCV). Selvaraju, R. R., Lee, S., Shen, Y., Jin, H., Ghosh, S., Heck, L., Batra, D., & Parikh, D. (2019) Taking a hint: Leveraging explanations to make vision and language models more grounded. In Proceedings of the international conference on computer vision (ICCV).
go back to reference Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484–489.CrossRef Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484–489.CrossRef
go back to reference Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.
go back to reference Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR. arXiv:1312.6034 Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR. arXiv:​1312.​6034
go back to reference Springenberg, J. T., Dosovitskiy, A., Brox, T., & Riedmiller, M. A. (2014). Striving for simplicity: The all convolutional net. CoRR. arXiv:1412.6806 Springenberg, J. T., Dosovitskiy, A., Brox, T., & Riedmiller, M. A. (2014). Striving for simplicity: The all convolutional net. CoRR. arXiv:​1412.​6806
go back to reference Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826). Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).
go back to reference Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In CVPR. Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In CVPR.
go back to reference Vondrick, C., Khosla, A., Malisiewicz, T., & Torralba, A. (2013). HOGgles: Visualizing object detection features. ICCV. Vondrick, C., Khosla, A., Malisiewicz, T., & Torralba, A. (2013). HOGgles: Visualizing object detection features. ICCV.
go back to reference Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV.
go back to reference Zhang, J., Lin, Z., Brandt, J., Shen, X., & Sclaroff, S. (2016). Top-down neural attention by excitation backprop. In ECCV. Zhang, J., Lin, Z., Brandt, J., Shen, X., & Sclaroff, S. (2016). Top-down neural attention by excitation backprop. In ECCV.
go back to reference Zhou, B., Khosla, A., Lapedriza, À., Oliva, A., & Torralba, A. (2014). Object detectors emerge in deep scene cnns. CoRR. arXiv:1412.6856 Zhou, B., Khosla, A., Lapedriza, À., Oliva, A., & Torralba, A. (2014). Object detectors emerge in deep scene cnns. CoRR. arXiv:​1412.​6856
go back to reference Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In CVPR. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In CVPR.
go back to reference Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., & Torralba, A. (2017). Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., & Torralba, A. (2017). Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence.
Metadata
Title
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
Authors
Ramprasaath R. Selvaraju
Michael Cogswell
Abhishek Das
Ramakrishna Vedantam
Devi Parikh
Dhruv Batra
Publication date
11-10-2019
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 2/2020
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-019-01228-7

Other articles of this Issue 2/2020

International Journal of Computer Vision 2/2020 Go to the issue

Premium Partner