nach oben

International Journal of Computer Vision

Erschienen in:

01.12.2016

Visualizing Deep Convolutional Neural Networks Using Natural Pre-images

verfasst von: Aravindh Mahendran, Andrea Vedaldi

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Image representations, from SIFT and bag of visual words to convolutional neural networks (CNNs) are a crucial component of almost all computer vision systems. However, our understanding of them remains limited. In this paper we study several landmark representations, both shallow and deep, by a number of complementary visualization techniques. These visualizations are based on the concept of “natural pre-image”, namely a natural-looking image whose representation has some notable property. We study in particular three such visualizations: inversion, in which the aim is to reconstruct an image from its representation, activation maximization, in which we search for patterns that maximally stimulate a representation component, and caricaturization, in which the visual patterns that a representation detects in an image are exaggerated. We pose these as a regularized energy-minimization framework and demonstrate its generality and effectiveness. In particular, we show that this method can invert representations such as HOG more accurately than recent alternatives while being applicable to CNNs too. Among our findings, we show that several layers in CNNs retain photographically accurate information about the image, with different degrees of geometric and photometric invariance.

Nächster Artikel Occlusion-Aware Stereo Matching

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

It is referred to as TV norm in Mahendran and Vedaldi (2015) but for \(\beta =2\) this is actually the quadratic norm.

In the following, the image \(\mathbf {x}\) is assumed to have null mean, as required by most CNN implementations.

This requires addressing a few more subtleties. Please see files dsift_net.m and hog_net.m for details.

Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Clarendon Press.MATH

Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In Proceedings of BMVC.

Chen, Y., Ranftl, R., & Pock, T. (2014). A bi-level view of inpainting-based image compression. In Proceedings of computer vision winter workshop.

Csurka, G., Dance, C. R., Dan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of ECCV workshop on statistical learning in computer vision.

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR.

d’Angelo, E., Alahi, A., & Vandergheynst, P. (2012). Beyond bits: Reconstructing images from local binary descriptors. In ICPR (pp. 935–938).

Dosovitskiy, A., & Brox, T. (2015). Inverting convolutional networks with convolutional networks. CoRR. arXiv:1506.02753.

Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learninig Research, 12, 2121–2159.MathSciNetMATH

Erhan, D., Bengio, Y., Courville, A., & Vincent, P. (2009). Visualizing higher-layer features of a deep network. Technical report (Vol. 1341). Montreal: University of Montreal.

Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The PASCAL visual object classes challenge 2010 (VOC2010) results. http://host.robots.ox.ac.uk/pascal/VOC/voc2010/results/index.html. Accessed 10 Apr 2016.

Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef

Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). A neural algorithm of artistic style. CoRR. arXiv:1508.06576.

Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks. In NIPS proceedings.

Girshick, R. B., Felzenszwalb, P. F., & McAllester, D. (2010). Discriminatively trained deformable part models, release 5. http://people.cs.uchicago.edu/~rbg/latent-release5/. Accessed 10 April 2016.

Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 5786.MathSciNetCrossRefMATH

Huang, W. J., & Mumford, D. (1999). Statistics of natural images and models. In Proceedings of CVPR.

Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In CVPR.

Jensen, C. A., Reed, R. D., Marks, R. J., El-Sharkawi, M., Jung, J. B., Miyamoto, R., et al. (1999). Inversion of feedforward neural networks: Algorithms and applications. Proceedings of the IEEE, 87, 9.CrossRef

Jia, Y. (2013). Caffe: An open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/. Accessed 14 Nov 2014.

Julesz, B. (1981). Textons, the elements of texture perception, and their interactions. Nature, 290(5802), 91–97.CrossRef

Kato, H., & Harada, T. (2014). Image reconstruction from bag-of-visual-words. In CVPR.

Krishnan, D., & Fergus, R. (2009). Fast image deconvolution using hyper-laplacian priors. In NIPS.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.

Lee, S., & Kil, R. M. (1994). Inverse mapping of continuous functions using local and global information. IEEE Transactions on Neural Networks, 5, 3.CrossRef

Leung, T., & Malik, J. (2001). Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV, 43, 1.CrossRefMATH

Linden, A., & Kindermann, J. (1989). Inversion of multilayer nets. In Proceedings of international conference on neural networks.

Lowe, D. G. (1999). Object recognition from local scale-invariant features. In ICCV.

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 2(60), 91–110.CrossRef

Lu, B. L., Kita, H., & Nishikawa, Y. (1999). Inverting feedforward neural networks using linear and nonlinear programming. IEEE Transactions on Neural Networks, 10, 6.CrossRef

Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. In Proceedings of CVPR.

Mordvintsev, A., Olah, C., & Tyka, M. (2015). Inceptionism: Going deeper into neural networks. http://googleresearch.blogspot.co.uk/2015/06/inceptionism-going-deeper-into-neural.html. Accessed 17 Oct 2015.

Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of CVPR.

Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features image classification. In ECCV.

Perronnin, F., & Dance, C. (2006). Fisher kernels on visual vocabularies for image categorizaton. In CVPR.

Portilla, J., & Simoncelli, E. P. (2000). A parametric texture model based on joint statistics of complex wavelet coefficients. IJCV, 40, 49–70.CrossRefMATH

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. IJCV, 115(3), 211–252. doi:10.1007/s11263-015-0816-y.MathSciNetCrossRef

Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2014). Overfeat: Integrated recognition, localization and detection using convolutional networks. In CoRR. arXiv:1312.6229.

Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Deep inside convolutional networks: Visualising image classification models and saliency maps. In Proceedings of ICLR.

Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., et al. (2014). Intriguing properties of neural networks. In Proceedings of ICLR.

Tatu, A., Lauze, F., Nielsen, M., & Kimia, B. (2011). Exploring the representation capabilities of the HOG descriptor. In ICCV workshop.

Várkonyi-Kóczy, A. R., & Rövid, A. (2005). Observer based iterative neural network model inversion. In IEEE international conference on fuzzy systems.

Vedaldi, A. (2007). An open implementation of the SIFT detector and descriptor. Technical report 070012. Los Angeles: UCLA CSD.

Vedaldi, A., & Lenc, K. (2014). MatConvNet: CNNs for MATLAB. http://www.vlfeat.org/matconvnet/. Accessed 17 Oct 2015.

Vondrick, C., Khosla, A., Malisiewicz, T., & Torralba, A. (2013). HOGgles: Visualizing object detection features. In ICCV.

Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR.

Weinzaepfel, P., Jégou, H., & Pérez, P. (2011). Reconstructing an image from its local descriptors. In CVPR.

Williams, R. J. (1986). Inverting a connectionist network mapping by back-propagation of error. In Proceedings of CogSci.

Yang, J., Yu, K., & Huang, T. (2010). Supervised translation-invariant sparse coding. In CVPR.

Yosinksi, J., Clune, J., Nguyen, A., Fuchs, T., & Lipson, H. (2015). Understanding neural networks through deep visualization. In Proceedings of ICML workshop.

Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method. CoRR. arXiv:1212.5701.

Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV.

Zhou, X., Yu, K., Zhang, T., & Huang, T. S. (2010). Image classification using super-vector coding of local image descriptors. In ECCV.

Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. (1997). Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software (TOMS), 23(4), 550–560.MathSciNetCrossRefMATH

Zhu, S. C., Wu, Y., & Mumford, D. (1998). Filters, random fields and maximum entropy (FRAME): Towards a unified theory for texture modeling. IJCV, 27(2), 4.CrossRef

Titel: Visualizing Deep Convolutional Neural Networks Using Natural Pre-images
verfasst von: Aravindh Mahendran
Andrea Vedaldi
Publikationsdatum: 01.12.2016
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 3/2016
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-016-0911-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 3/2016

Multiview Differential Geometry of Curves

DeepMatching: Hierarchical Deformable Dense Matching

Occlusion-Aware Stereo Matching

A Riemannian Bayesian Framework for Estimating Diffusion Tensor Images