Skip to main content
Erschienen in: International Journal of Computer Vision 3/2016

01.12.2016

Visualizing Deep Convolutional Neural Networks Using Natural Pre-images

verfasst von: Aravindh Mahendran, Andrea Vedaldi

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Image representations, from SIFT and bag of visual words to convolutional neural networks (CNNs) are a crucial component of almost all computer vision systems. However, our understanding of them remains limited. In this paper we study several landmark representations, both shallow and deep, by a number of complementary visualization techniques. These visualizations are based on the concept of “natural pre-image”, namely a natural-looking image whose representation has some notable property. We study in particular three such visualizations: inversion, in which the aim is to reconstruct an image from its representation, activation maximization, in which we search for patterns that maximally stimulate a representation component, and caricaturization, in which the visual patterns that a representation detects in an image are exaggerated. We pose these as a regularized energy-minimization framework and demonstrate its generality and effectiveness. In particular, we show that this method can invert representations such as HOG more accurately than recent alternatives while being applicable to CNNs too. Among our findings, we show that several layers in CNNs retain photographically accurate information about the image, with different degrees of geometric and photometric invariance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
It is referred to as TV norm in Mahendran and Vedaldi (2015) but for \(\beta =2\) this is actually the quadratic norm.
 
2
In the following, the image \(\mathbf {x}\) is assumed to have null mean, as required by most CNN implementations.
 
3
This requires addressing a few more subtleties. Please see files dsift_net.m and hog_net.m for details.
 
Literatur
Zurück zum Zitat Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Clarendon Press.MATH Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Clarendon Press.MATH
Zurück zum Zitat Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In Proceedings of BMVC. Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In Proceedings of BMVC.
Zurück zum Zitat Chen, Y., Ranftl, R., & Pock, T. (2014). A bi-level view of inpainting-based image compression. In Proceedings of computer vision winter workshop. Chen, Y., Ranftl, R., & Pock, T. (2014). A bi-level view of inpainting-based image compression. In Proceedings of computer vision winter workshop.
Zurück zum Zitat Csurka, G., Dance, C. R., Dan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of ECCV workshop on statistical learning in computer vision. Csurka, G., Dance, C. R., Dan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of ECCV workshop on statistical learning in computer vision.
Zurück zum Zitat Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR.
Zurück zum Zitat d’Angelo, E., Alahi, A., & Vandergheynst, P. (2012). Beyond bits: Reconstructing images from local binary descriptors. In ICPR (pp. 935–938). d’Angelo, E., Alahi, A., & Vandergheynst, P. (2012). Beyond bits: Reconstructing images from local binary descriptors. In ICPR (pp. 935–938).
Zurück zum Zitat Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learninig Research, 12, 2121–2159.MathSciNetMATH Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learninig Research, 12, 2121–2159.MathSciNetMATH
Zurück zum Zitat Erhan, D., Bengio, Y., Courville, A., & Vincent, P. (2009). Visualizing higher-layer features of a deep network. Technical report (Vol. 1341). Montreal: University of Montreal. Erhan, D., Bengio, Y., Courville, A., & Vincent, P. (2009). Visualizing higher-layer features of a deep network. Technical report (Vol. 1341). Montreal: University of Montreal.
Zurück zum Zitat Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef
Zurück zum Zitat Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks. In NIPS proceedings. Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks. In NIPS proceedings.
Zurück zum Zitat Huang, W. J., & Mumford, D. (1999). Statistics of natural images and models. In Proceedings of CVPR. Huang, W. J., & Mumford, D. (1999). Statistics of natural images and models. In Proceedings of CVPR.
Zurück zum Zitat Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In CVPR. Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In CVPR.
Zurück zum Zitat Jensen, C. A., Reed, R. D., Marks, R. J., El-Sharkawi, M., Jung, J. B., Miyamoto, R., et al. (1999). Inversion of feedforward neural networks: Algorithms and applications. Proceedings of the IEEE, 87, 9.CrossRef Jensen, C. A., Reed, R. D., Marks, R. J., El-Sharkawi, M., Jung, J. B., Miyamoto, R., et al. (1999). Inversion of feedforward neural networks: Algorithms and applications. Proceedings of the IEEE, 87, 9.CrossRef
Zurück zum Zitat Julesz, B. (1981). Textons, the elements of texture perception, and their interactions. Nature, 290(5802), 91–97.CrossRef Julesz, B. (1981). Textons, the elements of texture perception, and their interactions. Nature, 290(5802), 91–97.CrossRef
Zurück zum Zitat Kato, H., & Harada, T. (2014). Image reconstruction from bag-of-visual-words. In CVPR. Kato, H., & Harada, T. (2014). Image reconstruction from bag-of-visual-words. In CVPR.
Zurück zum Zitat Krishnan, D., & Fergus, R. (2009). Fast image deconvolution using hyper-laplacian priors. In NIPS. Krishnan, D., & Fergus, R. (2009). Fast image deconvolution using hyper-laplacian priors. In NIPS.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.
Zurück zum Zitat Lee, S., & Kil, R. M. (1994). Inverse mapping of continuous functions using local and global information. IEEE Transactions on Neural Networks, 5, 3.CrossRef Lee, S., & Kil, R. M. (1994). Inverse mapping of continuous functions using local and global information. IEEE Transactions on Neural Networks, 5, 3.CrossRef
Zurück zum Zitat Leung, T., & Malik, J. (2001). Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV, 43, 1.CrossRefMATH Leung, T., & Malik, J. (2001). Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV, 43, 1.CrossRefMATH
Zurück zum Zitat Linden, A., & Kindermann, J. (1989). Inversion of multilayer nets. In Proceedings of international conference on neural networks. Linden, A., & Kindermann, J. (1989). Inversion of multilayer nets. In Proceedings of international conference on neural networks.
Zurück zum Zitat Lowe, D. G. (1999). Object recognition from local scale-invariant features. In ICCV. Lowe, D. G. (1999). Object recognition from local scale-invariant features. In ICCV.
Zurück zum Zitat Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 2(60), 91–110.CrossRef Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 2(60), 91–110.CrossRef
Zurück zum Zitat Lu, B. L., Kita, H., & Nishikawa, Y. (1999). Inverting feedforward neural networks using linear and nonlinear programming. IEEE Transactions on Neural Networks, 10, 6.CrossRef Lu, B. L., Kita, H., & Nishikawa, Y. (1999). Inverting feedforward neural networks using linear and nonlinear programming. IEEE Transactions on Neural Networks, 10, 6.CrossRef
Zurück zum Zitat Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. In Proceedings of CVPR. Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. In Proceedings of CVPR.
Zurück zum Zitat Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of CVPR. Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of CVPR.
Zurück zum Zitat Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features image classification. In ECCV. Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features image classification. In ECCV.
Zurück zum Zitat Perronnin, F., & Dance, C. (2006). Fisher kernels on visual vocabularies for image categorizaton. In CVPR. Perronnin, F., & Dance, C. (2006). Fisher kernels on visual vocabularies for image categorizaton. In CVPR.
Zurück zum Zitat Portilla, J., & Simoncelli, E. P. (2000). A parametric texture model based on joint statistics of complex wavelet coefficients. IJCV, 40, 49–70.CrossRefMATH Portilla, J., & Simoncelli, E. P. (2000). A parametric texture model based on joint statistics of complex wavelet coefficients. IJCV, 40, 49–70.CrossRefMATH
Zurück zum Zitat Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2014). Overfeat: Integrated recognition, localization and detection using convolutional networks. In CoRR. arXiv:1312.6229. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2014). Overfeat: Integrated recognition, localization and detection using convolutional networks. In CoRR. arXiv:​1312.​6229.
Zurück zum Zitat Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Deep inside convolutional networks: Visualising image classification models and saliency maps. In Proceedings of ICLR. Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Deep inside convolutional networks: Visualising image classification models and saliency maps. In Proceedings of ICLR.
Zurück zum Zitat Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV
Zurück zum Zitat Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., et al. (2014). Intriguing properties of neural networks. In Proceedings of ICLR. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., et al. (2014). Intriguing properties of neural networks. In Proceedings of ICLR.
Zurück zum Zitat Tatu, A., Lauze, F., Nielsen, M., & Kimia, B. (2011). Exploring the representation capabilities of the HOG descriptor. In ICCV workshop. Tatu, A., Lauze, F., Nielsen, M., & Kimia, B. (2011). Exploring the representation capabilities of the HOG descriptor. In ICCV workshop.
Zurück zum Zitat Várkonyi-Kóczy, A. R., & Rövid, A. (2005). Observer based iterative neural network model inversion. In IEEE international conference on fuzzy systems. Várkonyi-Kóczy, A. R., & Rövid, A. (2005). Observer based iterative neural network model inversion. In IEEE international conference on fuzzy systems.
Zurück zum Zitat Vedaldi, A. (2007). An open implementation of the SIFT detector and descriptor. Technical report 070012. Los Angeles: UCLA CSD. Vedaldi, A. (2007). An open implementation of the SIFT detector and descriptor. Technical report 070012. Los Angeles: UCLA CSD.
Zurück zum Zitat Vondrick, C., Khosla, A., Malisiewicz, T., & Torralba, A. (2013). HOGgles: Visualizing object detection features. In ICCV. Vondrick, C., Khosla, A., Malisiewicz, T., & Torralba, A. (2013). HOGgles: Visualizing object detection features. In ICCV.
Zurück zum Zitat Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR.
Zurück zum Zitat Weinzaepfel, P., Jégou, H., & Pérez, P. (2011). Reconstructing an image from its local descriptors. In CVPR. Weinzaepfel, P., Jégou, H., & Pérez, P. (2011). Reconstructing an image from its local descriptors. In CVPR.
Zurück zum Zitat Williams, R. J. (1986). Inverting a connectionist network mapping by back-propagation of error. In Proceedings of CogSci. Williams, R. J. (1986). Inverting a connectionist network mapping by back-propagation of error. In Proceedings of CogSci.
Zurück zum Zitat Yang, J., Yu, K., & Huang, T. (2010). Supervised translation-invariant sparse coding. In CVPR. Yang, J., Yu, K., & Huang, T. (2010). Supervised translation-invariant sparse coding. In CVPR.
Zurück zum Zitat Yosinksi, J., Clune, J., Nguyen, A., Fuchs, T., & Lipson, H. (2015). Understanding neural networks through deep visualization. In Proceedings of ICML workshop. Yosinksi, J., Clune, J., Nguyen, A., Fuchs, T., & Lipson, H. (2015). Understanding neural networks through deep visualization. In Proceedings of ICML workshop.
Zurück zum Zitat Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV.
Zurück zum Zitat Zhou, X., Yu, K., Zhang, T., & Huang, T. S. (2010). Image classification using super-vector coding of local image descriptors. In ECCV. Zhou, X., Yu, K., Zhang, T., & Huang, T. S. (2010). Image classification using super-vector coding of local image descriptors. In ECCV.
Zurück zum Zitat Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. (1997). Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software (TOMS), 23(4), 550–560.MathSciNetCrossRefMATH Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. (1997). Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software (TOMS), 23(4), 550–560.MathSciNetCrossRefMATH
Zurück zum Zitat Zhu, S. C., Wu, Y., & Mumford, D. (1998). Filters, random fields and maximum entropy (FRAME): Towards a unified theory for texture modeling. IJCV, 27(2), 4.CrossRef Zhu, S. C., Wu, Y., & Mumford, D. (1998). Filters, random fields and maximum entropy (FRAME): Towards a unified theory for texture modeling. IJCV, 27(2), 4.CrossRef
Metadaten
Titel
Visualizing Deep Convolutional Neural Networks Using Natural Pre-images
verfasst von
Aravindh Mahendran
Andrea Vedaldi
Publikationsdatum
01.12.2016
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2016
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-016-0911-8

Weitere Artikel der Ausgabe 3/2016

International Journal of Computer Vision 3/2016 Zur Ausgabe