nach oben

International Journal of Computer Vision

Erschienen in:

01.05.2015

A Neural Autoregressive Approach to Attention-based Recognition

verfasst von: Yin Zheng, Richard S. Zemel, Yu-Jin Zhang, Hugo Larochelle

Erschienen in: International Journal of Computer Vision | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Tasks that require the synchronization of perception and action are incredibly hard and pose a fundamental challenge to the fields of machine learning and computer vision. One important example of such a task is the problem of performing visual recognition through a sequence of controllable fixations; this requires jointly deciding what inference to perform from fixations and where to perform these fixations. While these two problems are challenging when addressed separately, they become even more formidable if solved jointly. Recently, a restricted Boltzmann machine (RBM) model was proposed that could learn meaningful fixation policies and achieve good recognition performance. In this paper, we propose an alternative approach based on a feed-forward, auto-regressive architecture, which permits exact calculation of training gradients (given the fixation sequence), unlike for the RBM model. On a problem of facial expression recognition, we demonstrate the improvement gained by this alternative approach. Additionally, we investigate several variations of the model in order to shed some light on successful strategies for fixation-based recognition.

Vorheriger Artikel Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

This is done by setting \(\mathbf {z}\left( i_k,j_k\right) = \mathrm{sigmoid}\left( \bar{ \mathbf {z}}\left( i_k,j_k\right) \right) \), and learning the unconstrained \(\bar{ \mathbf {z}}\left( i_k,j_k\right) \) vectors instead. We also use a learning rate \(100\) times larger than learning the other parameters.

The retinal transformation covered a patch of \(44\times 44\) pixels, without using a lower resolution periphery. Hence, the total number of pixels is \(1936\).

Bazzani, L., Freitas, N., Larochelle, H., Murino, V., & Ting, J.-A. (2011). Learning attentional policies for tracking and recognition in video with deep networks. In Proceedings of the 28th international conference on machine learning (ICML 2011) (pp. 937–944). ACM.

Butko, N. J., & Movellan, J. R. (2010). Infomax control of eye movements. IEEE Transactions on Autonomous Mental Development, 2(2), 91–107.CrossRef

Cheng, M.-M., Zhang, G.-X., Mitra, N. J., Huang, X., & Hu, S.-M. (2011). Global contrast based salient region detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011 (pp. 409–416). IEEE.

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition. CVPR 2005 (Vol. 1, pp. 886–893). IEEE.

David, G. (2004). Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef

Denil, M., Bazzani, L., Larochelle, H., & de Freitas, N. (2012). Learning where to attend with deep architectures for image tracking. Neural Computation, 24(8), 2151–2184.CrossRefMathSciNet

Erez, T., Tramper, J. J., Smart, W. D., & Stan CAM Gielen. (2011). A pomdp model of eye-hand coordination. In AAAI.

Fazl, A., Grossberg, S., & Mingolla, E. (2009). View-invariant object category learning, recognition, and search: How spatial and object attention are coordinated using surface-based attentional shrouds. Cognitive psychology, 58(1), 1–48.CrossRef

Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.

Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.CrossRefMATHMathSciNet

Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.

Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2009). Learning to predict where humans look. In IEEE International Conference on Computer Vision (ICCV).

Kanan, C., & Cottrell, G. (2010) Robust classification of objects, faces, and flowers using natural image statistics. In CVPR.

Krause, A., & Ong, C. S. (2011). Contextual gaussian process bandit optimization. In NIPS (pp. 2447–2455).

Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1106–1114.

Larochelle, H., & Bengio, Y. (2008). Classification using discriminative restricted boltzmann machines. In Proceedings of the 25th international conference on machine learning (pp. 536–543). ACM.

Larochelle, H., & Hinton, G. E. (2010). Learning to combine foveal glimpses with a third-order Boltzmann machine. In Advances in neural information processing systems (pp. 1243–1251).

Larochelle, H., & Murray, I. (2011). The neural autoregressive distribution estimator. Artificial Intelligence and Statistics (AISTATS), 15, 29–37.

Larochelle, H., & Lauly, S. (2012). A neural autoregressive topic model. Advances in Neural Information Processing Systems, 25, 2717–2725.

Lazebnik, S. (2006). Cordelia, and Jean Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR.

Mathe, S., & Sminchisescu, C. (2013). Action from still image dataset and inverse optimal control to learn task specific visual scanpaths. In Advances in neural information processing systems (pp. 1923–1931, 2013).

Nair, V., & Hinton, G. E. (2010) Rectified linear units improve restricted boltzmann machines. In ICML.

Najemnik, J., & Geisler, W. S. (2005). Optimal eye movement strategies in visual search. Nature, 434(7031), 387–391.CrossRef

Perazzi, F., Krahenbuhl, P., Pritch, Y., & Hornung, A. (2012). Saliency filters: Contrast based filtering for salient region detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012 (pp. 733–740). IEEE.

Rifai, S., Vincent, P., Muller, X., Glorot, X., & Bengio, Y. (2011). Contractive auto-encoders: Explicit invariance during feature extraction. In Proceedings of the 28th international conference on machine learning (ICML 2011).

Schmidhuber, J., & Huber, R. (1991). Learning to generate artificial fovea trajectories for target detection. International Journal of Neural Systems, 2(01n02), 125–134.

Southall, J. P. C. (1962). Helmholtzs treatise on physiological optics. vol. 2: The sensation of vision, trans. J. P. C. Southall. (translated from the third german edition).

Susskind, J. M., Anderson, A. K., & Hinton, G. E. (2010). The toronto face database. Department of Computer Science, University of Toronto, Toronto, ON, Canada, Tech. Rep.

Uria, B., Murray, I., & Larochelle, H. (2013). Rnade: The real-valued neural autoregressive density-estimator. Advances in Neural Information Processing Systems, 26, 2175–2183.

Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on machine learning (ICML 2008) (pp. 1096–1103). ACM.

Yang, J., Yu., K., & Gong, Y. (2009). Linear spatial pyramid matching using sparse coding for image classification. In CVPR.

Titel: A Neural Autoregressive Approach to Attention-based Recognition
verfasst von: Yin Zheng
Richard S. Zemel
Yu-Jin Zhang
Hugo Larochelle
Publikationsdatum: 01.05.2015
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 1/2015
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-014-0765-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2015

Guest Editorial: Deep Learning

Discriminative Deep Face Shape Model for Facial Point Detection

Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network

Stacked Predictive Sparse Decomposition for Classification of Histology Sections

Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition