Elsevier

Neurocomputing

Volume 99, 1 January 2013, Pages 575-580
Neurocomputing

Letters
Handwritten digit recognition using biologically inspired features

https://doi.org/10.1016/j.neucom.2012.07.027Get rights and content

Abstract

Image recognition problems are usually difficult to solve using raw pixel data. To improve the recognition it is often needed some form of feature extraction to represent the data in a feature space. We use the output of a biologically inspired model for visual recognition as a feature space. The output of the model is a binary code which is used to train a linear classifier for recognizing handwritten digits using the MNIST and USPS datasets. We evaluate the robustness of the approach to a variable number of training samples and compare its performance on these popular datasets to other published results. We achieve competitive error rates on both datasets while greatly improving relatively to related networks using a linear classifier.

Introduction

Handwritten digit recognition despite being a well studied problem is still an active topic of research. This problem is relevant for tasks like postal mail sorting or form data processing. Several works have been devoted to the problem from a feature extraction or classification perspective. In this text we analyze the application of the map transformation cascade (MTC) [1] to this task, which works as feature extractor combined with a classifier. MTC is a model for visual recognition where simple and complex cells are arranged in a hierarchy like proposed by Hubel and Wiesel for the visual cortex [2] and incorporated in several models like Neocognitron [3] and HMAX [4]. In [1] the MTC relation and comparison with Neocognitron was established using a nearest neighbor classifier. In this text we discuss how it relates to HMAX [4] and compares with other pattern recognition methods on two popular datasets of handwritten digits using a linear classifier. A combination of HMAX's features and a classifier has been shown to achieve good results on object recognition [5].

In the next section we make a short overview of biological vision and computational models for visual recognition. Afterwards we describe MTC and finally evaluate its performance of MTC on handwritten digit recognition using the USPS and MNIST datasets. We analyze how the performance of the approach is affected by the number of training samples and finally measure the error rate on the entire dataset.

Section snippets

Related work

The classical hypothesis of Hubel and Wiesel [6] has been transposed into several computational models for visual recognition. The key idea is that two kinds of cells are arranged in layers, being the simple cells selective for a particular stimulus and a position of that stimulus in the visual field and complex cells also selective for a particular stimulus but less selective for its position in the visual field. These two types of cells are then arranged in a hierarchy where the cells'

Map transformation cascade

In this section we describe MTC which was previously proposed in [1]. The model was proposed to retain the functional principles of Neocognitron in a computationally simpler way. MTC is composed by two types of cells arranged hierarchically. Simple cells are responsible for selectivity by reacting to a particular stimulus. Complex cells are responsible for invariance to position of the stimulus. The two types of cells are arranged in layers of the same cell type. Layers are arranged in ordered

Experiments

In the experiments we evaluate the performance of MTC combined with a linear SVM.

A SVM, as originally proposed, solves a binary classification problem [36]. For the multi-class problem we used the ‘one-against-one’ approach [37], [38]. Therefore we solve a binary classification problem for all the two class combinations, training k(k1)/2 binary classifiers. The output of the binary classifiers is then combined by voting [39]. Another possible approach is the ‘one-against-all’, for a comparison

Conclusion

We evaluated the combination of MTC with a linear classifier. MTC showed good generalization for a small number of training examples. The combination of MTC and a linear SVM achieved competitive results on both USPS (2.64%) and MNIST (0.71%) datasets. MTC greatly improves the results relatively to using a deep belief network with a linear SVM [48]. It is also interesting that in [27] quasi-binary codes are unsuitable for classification, while the MTC binary codes can be used for classification

Acknowledgments

The authors would like to thank João Sacramento for much helpful comments. This work was supported by Fundação para a Ciência e Tecnologia (INESC-ID multiannual funding) through the PIDDAC Program funds and through an individual doctoral grant awarded to the first author (Contract SFRH/BD/61513/2009).

Ângelo Cardoso got a MSc in Information Systems and Computer Engineering from the Instituto Superior Técnico (IST), Technical University of Lisbon (TU-Lisbon) in 2007. Since 2009 he is a PhD student at IST, TU-Lisbon and INESC-ID. His PhD work is supported by an individual scholarship from Fundação para a Ciência e Tecnologia (FCT). He is currently working on biological models for object recognition and machine learning.

References (48)

  • D. Hubel et al.

    Eye, Brain, and Vision

    (1988)
  • D.H. Hubel et al.

    Uniformity of monkey striate cortexa parallel relationship between field size, scatter, and magnification factor

    J. Comp. Neurol.

    (1974)
  • E. Kobatake et al.

    Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex

    J. Neurophysiol.

    (1994)
  • H. Bülthoff et al.

    Psychophysical support for a two-dimensional view interpolation theory of object recognition

    Proc. Natl. Acad. Sci. U.S.A.

    (1992)
  • M.C. Booth et al.

    View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex

    Cerebral Cortex

    (1998)
  • D.H. Hubel et al.

    Sequence regularity and geometry of orientation columns in the monkey striate cortex

    J. Comp. Neurol.

    (1974)
  • C. Blakemore et al.

    Lateral inhibition between orientation detectors in the cat's visual cortex

    Exp. Brain Res.

    (1972)
  • K. Fukushima

    Cognitrona self-organizing multilayered neural network

    Biol. Cybern.

    (1975)
  • M. Riesenhuber, How a Part of the Brain Might or Might Not Work: A New Hierarchical Model of Object Recognition, Ph.D....
  • T. Serre, M. Riesenhuber, Realistic Modeling of Simple and Complex Cell Tuning in the HMAXModel, and Implications for...
  • R. Miikkulainen et al.

    Computational Maps in the Visual Cortex

    (2005)
  • Y. Lecun et al.

    Gradient-based learning applied to document recognition

    Proc. IEEE

    (1998)
  • S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene...
  • D. Lowe, Towards a computational model for object recognition in IT cortex, in: Biologically Motivated Computer Vision,...
  • Cited by (25)

    • DRAW-A-PIN: Authentication using finger-drawn PIN on touch devices

      2017, Computers and Security
      Citation Excerpt :

      In an online system like Draw-A-PIN, a sequence of x–y coordinates with time labels is acquired and used for recognition. While significant amount of research has been done on off-line systems (Cardoso and Wichert, 2013; Ciresan et al., 2012; LeCun et al., 1995; Liu et al., 2003), little research has been done for online systems (Connell and Jain, 2001; Kim and Sin, 2014). In this work, $P algorithm (Point-Cloud Recognizer), a fast, simple, and accurate gesture recognition approach that is based on templates and nearest-neighbor classification (Vatavu et al., 2012) is adopted and modified to be used as the Digit Recognizer.

    • Handwriting recognition of digits, signs, and numerical strings in Persian

      2016, Computers and Electrical Engineering
      Citation Excerpt :

      However, some of the researchers in other countries also are working on the English handwritten datasets. Therefore, after a on face searching, it seems that the recognition of Latin numeral characters has attracted much attention [7,8,9]. Because it is a handy case for testing various techniques (preprocessing, feature extraction, and classification) and it has many applications (postal mail sorting, check reading, form processing, etc.).

    • Modular neural networks with radial neural columnar architecture

      2015, Biologically Inspired Cognitive Architectures
      Citation Excerpt :

      The comparison has shown that recognition capability of the modular neural network exceeds that of the LiRA classifier while using the same set of features. In the present work, both classifiers also use identical set of LiRA features for recognition of handwritten digits of the MNIST database (http://yann.lecun.com/exdb/mnist/) which is rather often used to evaluate recognition capabilities of different classifiers (e.g., Cardoso & Wichert, 2013). The main motivation of this paper is to present a new radial neural columnar architecture for modular neural network with considerable reduction of the number of its learning connections versus the former full-connected modular assembly neural networks.

    • Image receptive fields for artificial neural networks

      2014, Neurocomputing
      Citation Excerpt :

      More recent variants like LeCun׳s Convolutional Neural Networks [19] or Hinton׳s Deep Learning architecture [20] obtained remarkable results for some applications like automatic classification of manuscript numbers or characters. Several research teams develop these networks with success in challenging benchmarks, e.g. Cardoso and Wichert [21], Cireşan et al. [22], and Krizhevsky et al. [23]. These networks are relatively large.

    View all citing articles on Scopus

    Ângelo Cardoso got a MSc in Information Systems and Computer Engineering from the Instituto Superior Técnico (IST), Technical University of Lisbon (TU-Lisbon) in 2007. Since 2009 he is a PhD student at IST, TU-Lisbon and INESC-ID. His PhD work is supported by an individual scholarship from Fundação para a Ciência e Tecnologia (FCT). He is currently working on biological models for object recognition and machine learning.

    Andreas Wichert studied computer science at the University of Saarland, where he graduated in 1993. Afterwards, he became a PhD student at the Department of Neural Information Processing, University of Ulm. He received a PhD in computer science in 2000. He has since worked in the field of fMRI as a researcher with an interdisciplinary group, Department of Psychiatry III Ulm, changing to F&K Delvotec bonding machines where he led the development of a diagnostic expert system. From 2004 to 2005 he was the scientific director of MITI Research Group Klinikum rechts der Isar of the Technical University Munich. Recently he joined the Faculdade de Ciências da Universidade de Lisboa Departamento de Informática and Departamento de Informática, Universidade Técnica de Lisboa (DEI-IST).

    View full text