Handwritten digit recognition using biologically inspired features

doi:10.1016/j.neucom.2012.07.027

Neurocomputing

Volume 99, 1 January 2013, Pages 575-580

https://doi.org/10.1016/j.neucom.2012.07.027 Get rights and content

Abstract

Image recognition problems are usually difficult to solve using raw pixel data. To improve the recognition it is often needed some form of feature extraction to represent the data in a feature space. We use the output of a biologically inspired model for visual recognition as a feature space. The output of the model is a binary code which is used to train a linear classifier for recognizing handwritten digits using the MNIST and USPS datasets. We evaluate the robustness of the approach to a variable number of training samples and compare its performance on these popular datasets to other published results. We achieve competitive error rates on both datasets while greatly improving relatively to related networks using a linear classifier.

Introduction

Handwritten digit recognition despite being a well studied problem is still an active topic of research. This problem is relevant for tasks like postal mail sorting or form data processing. Several works have been devoted to the problem from a feature extraction or classification perspective. In this text we analyze the application of the map transformation cascade (MTC) [1] to this task, which works as feature extractor combined with a classifier. MTC is a model for visual recognition where simple and complex cells are arranged in a hierarchy like proposed by Hubel and Wiesel for the visual cortex [2] and incorporated in several models like Neocognitron [3] and HMAX [4]. In [1] the MTC relation and comparison with Neocognitron was established using a nearest neighbor classifier. In this text we discuss how it relates to HMAX [4] and compares with other pattern recognition methods on two popular datasets of handwritten digits using a linear classifier. A combination of HMAX's features and a classifier has been shown to achieve good results on object recognition [5].

In the next section we make a short overview of biological vision and computational models for visual recognition. Afterwards we describe MTC and finally evaluate its performance of MTC on handwritten digit recognition using the USPS and MNIST datasets. We analyze how the performance of the approach is affected by the number of training samples and finally measure the error rate on the entire dataset.

Section snippets

Related work

The classical hypothesis of Hubel and Wiesel [6] has been transposed into several computational models for visual recognition. The key idea is that two kinds of cells are arranged in layers, being the simple cells selective for a particular stimulus and a position of that stimulus in the visual field and complex cells also selective for a particular stimulus but less selective for its position in the visual field. These two types of cells are then arranged in a hierarchy where the cells'

Map transformation cascade

In this section we describe MTC which was previously proposed in [1]. The model was proposed to retain the functional principles of Neocognitron in a computationally simpler way. MTC is composed by two types of cells arranged hierarchically. Simple cells are responsible for selectivity by reacting to a particular stimulus. Complex cells are responsible for invariance to position of the stimulus. The two types of cells are arranged in layers of the same cell type. Layers are arranged in ordered

Experiments

In the experiments we evaluate the performance of MTC combined with a linear SVM.

A SVM, as originally proposed, solves a binary classification problem [36]. For the multi-class problem we used the ‘one-against-one’ approach [37], [38]. Therefore we solve a binary classification problem for all the two class combinations, training $k (k - 1) / 2$ binary classifiers. The output of the binary classifiers is then combined by voting [39]. Another possible approach is the ‘one-against-all’, for a comparison

Conclusion

We evaluated the combination of MTC with a linear classifier. MTC showed good generalization for a small number of training examples. The combination of MTC and a linear SVM achieved competitive results on both USPS (2.64%) and MNIST (0.71%) datasets. MTC greatly improves the results relatively to using a deep belief network with a linear SVM [48]. It is also interesting that in [27] quasi-binary codes are unsuitable for classification, while the MTC binary codes can be used for classification

Acknowledgments

The authors would like to thank João Sacramento for much helpful comments. This work was supported by Fundação para a Ciência e Tecnologia (INESC-ID multiannual funding) through the PIDDAC Program funds and through an individual doctoral grant awarded to the first author (Contract SFRH/BD/61513/2009).

Ângelo Cardoso got a MSc in Information Systems and Computer Engineering from the Instituto Superior Técnico (IST), Technical University of Lisbon (TU-Lisbon) in 2007. Since 2009 he is a PhD student at IST, TU-Lisbon and INESC-ID. His PhD work is supported by an individual scholarship from Fundação para a Ciência e Tecnologia (FCT). He is currently working on biological models for object recognition and machine learning.

References (48)

Â. Cardoso et al.
Neocognitron and the map transformation cascade
Neural Networks
(2010)
N. Logothetis et al.
Shape representation in the inferior temporal cortex of monkeys
Curr. Biol.
(1995)
K. Fukushima
Neocognitrona hierarchical neural network capable of visual pattern recognition
Neural Networks
(1988)
K. Fukushima
Increasing robustness against background noisevisual pattern recognition by a neocognitron
Neural Networks
(2011)
K. Fukushima
Neocognitron for handwritten digit recognition
Neurocomputing
(2003)
C.L. Liu et al.
Handwritten digit recognitionbenchmarking of state-of-the-art techniques
Pattern Recognition
(2003)
D. Hubel et al.
Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat
J. Neurophysiol.
(1965)
K. Fukushima
Neocognitrona self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position
Biol. Cybern.
(1980)
M. Riesenhuber et al.
Hierarchical models of object recognition in cortex
Nat. Neurosci.
(1999)
T. Serre et al.
A feedforward architecture accounts for rapid categorization
Proc. Natl. Acad. Sci. U.S.A.
(2007)

D. Hubel et al.

Eye, Brain, and Vision

(1988)

D.H. Hubel et al.

Uniformity of monkey striate cortexa parallel relationship between field size, scatter, and magnification factor

J. Comp. Neurol.

(1974)

E. Kobatake et al.

Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex

J. Neurophysiol.

(1994)

H. Bülthoff et al.

Psychophysical support for a two-dimensional view interpolation theory of object recognition

Proc. Natl. Acad. Sci. U.S.A.

(1992)

M.C. Booth et al.

View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex

Cerebral Cortex

(1998)

D.H. Hubel et al.

Sequence regularity and geometry of orientation columns in the monkey striate cortex

J. Comp. Neurol.

(1974)

C. Blakemore et al.

Lateral inhibition between orientation detectors in the cat's visual cortex

Exp. Brain Res.

(1972)

K. Fukushima

Cognitrona self-organizing multilayered neural network

Biol. Cybern.

(1975)

M. Riesenhuber, How a Part of the Brain Might or Might Not Work: A New Hierarchical Model of Object Recognition, Ph.D....

T. Serre, M. Riesenhuber, Realistic Modeling of Simple and Complex Cell Tuning in the HMAXModel, and Implications for...

R. Miikkulainen et al.

Computational Maps in the Visual Cortex

(2005)

Y. Lecun et al.

Gradient-based learning applied to document recognition

Proc. IEEE

(1998)

S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene...

D. Lowe, Towards a computational model for object recognition in IT cortex, in: Biologically Motivated Computer Vision,...

Cited by (25)

Attention Inspired Network: Steep learning curve in an invariant pattern recognition model
2019, Neural Networks
Hubel and Wiesel’s study about low areas of the visual cortex (VC) inspired deep models for invariant pattern recognition. In such models, simple and complex layers alternate local feature extraction with subsampling to add invariance to distortion or transformations. However, it was shown that to tolerate large changes between examples of the same category, the subsampling operation has to discard so much information that the model loses the capability to discriminate between categories. So, in practice, small changes are tolerated by these layers and, afterwards, a powerful classifier is introduced to do the rest. By incorporating insights from higher areas of the VC, we add to the already used retinotopic step an object-centered step which increases invariance capabilities without losing so much information. By doing so, we reduce the need for a powerful, data hungry classification layer and, thus, are able to introduce a simple classification mechanism which is based on selective attention. The resulting model is tested with an invariant pattern recognition task in the MNIST and ETL-1 datasets. We verify that the model is able to achieve better accuracies with less training examples. More specifically, on the MNIST test set, the model achieves a 100% accuracy when trained with little more than 10% of the training set.
DRAW-A-PIN: Authentication using finger-drawn PIN on touch devices
2017, Computers and Security
Citation Excerpt :
In an online system like Draw-A-PIN, a sequence of x–y coordinates with time labels is acquired and used for recognition. While significant amount of research has been done on off-line systems (Cardoso and Wichert, 2013; Ciresan et al., 2012; LeCun et al., 1995; Liu et al., 2003), little research has been done for online systems (Connell and Jain, 2001; Kim and Sin, 2014). In this work, $P algorithm (Point-Cloud Recognizer), a fast, simple, and accurate gesture recognition approach that is based on templates and nearest-neighbor classification (Vatavu et al., 2012) is adopted and modified to be used as the Digit Recognizer.
This paper presents Draw-A-PIN, a user authentication system on a device with a touch interface that supports the use of PINs. In the proposed system, the user is asked to draw her PIN on the touch screen instead of typing it on a keypad. Consequently, Draw-A-PIN could offer better security by utilizing drawing traits or behavioral biometrics as an additional authentication factor beyond just the secrecy of the PIN. In addition, Draw-A-PIN inherently provides acceptability and usability by leveraging user familiarity with PINs. To evaluate the security and usability of the approach, Draw-A-PIN was implemented on Android phones and 3203 legitimate finger-drawn PINs and 4655 forgery samples were collected through an extensive and unsupervised field experiment over 10 consecutive days. Experimental results show that Draw-A-PIN achieves an equal error rate of 4.84% in a scenario where the attacker already knows the PIN by shoulder surfing. Finally, results from a user study based on the System Usability Scale questionnaire confirm that Draw-A-PIN is highly usable.
Handwriting recognition of digits, signs, and numerical strings in Persian
2016, Computers and Electrical Engineering
Citation Excerpt :
However, some of the researchers in other countries also are working on the English handwritten datasets. Therefore, after a on face searching, it seems that the recognition of Latin numeral characters has attracted much attention [7,8,9]. Because it is a handy case for testing various techniques (preprocessing, feature extraction, and classification) and it has many applications (postal mail sorting, check reading, form processing, etc.).
This paper presents an important step towards the standardization of research works on Optical Character Recognition in Persian language. It describes the formations of a standard handwritten database, including isolated digits, isolated signs, multi-digit numbers, numerical strings, courtesy amounts, and postal codes. In this regard, binary images of 72,180 samples were extracted from the designed forms. These forms were filled by 180 writers selected from different ages, genders, and jobs. Then these forms were scanned at 300 dpi with a high-speed scanner. Finally, forms are segmented into samples and are stored in bitmap format. This database is named PHOND, Persian Handwritten Optical Numbers & Digits, and it is available to the research community. Comparisons with the previous related databases illustrate the advantages of PHOND against other databases. Different experiments are done using PHOND database and the results are compared with other research works in handwritten recognition.
Modular neural networks with radial neural columnar architecture
2015, Biologically Inspired Cognitive Architectures
Citation Excerpt :
The comparison has shown that recognition capability of the modular neural network exceeds that of the LiRA classifier while using the same set of features. In the present work, both classifiers also use identical set of LiRA features for recognition of handwritten digits of the MNIST database (http://yann.lecun.com/exdb/mnist/) which is rather often used to evaluate recognition capabilities of different classifiers (e.g., Cardoso & Wichert, 2013). The main motivation of this paper is to present a new radial neural columnar architecture for modular neural network with considerable reduction of the number of its learning connections versus the former full-connected modular assembly neural networks.
A new radial columnar architecture for the modular assembly neural network is proposed together with a modification of this architecture which differs in less number of learning connections in the network versus the former full-connected modular assembly neural networks. Validation of the latter architecture has been done in experiments on recognition of handwritten digits of the MNIST database. The experiments allow us to conclude that efficiency of the modular neural network with reduced number of learning connections is only slightly less than that of the full-connected modular neural network. Also, the experiments have demonstrated that its recognition capability is higher than that of the LiRA classifier. The main result of the work is that the full-connected network can be successfully replaced by its reduced version with retaining almost the same performance and with acquisition of a much higher speed of image processing.
A linear approach for sparse coding by a two-layer neural network
2015, Neurocomputing
Many approaches to transform classification problems from non-linear to linear by feature transformation have been recently presented in the literature. These notably include sparse coding methods and deep neural networks. However, many of these approaches require the repeated application of a learning process upon the presentation of unseen data input vectors, or else involve the use of large numbers of parameters and hyper-parameters, which must be chosen through cross-validation, thus increasing running time dramatically. In this paper, we propose and experimentally investigate a new approach for the purpose of overcoming limitations of both kinds. The proposed approach makes use of a linear auto-associative network (called SCNN) with just one hidden layer. The combination of this architecture with a specific error function to be minimized enables one to learn a linear encoder computing a sparse code which turns out to be as similar as possible to the sparse coding that one obtains by re-training the neural network. Importantly, the linearity of SCNN and the choice of the error function allow one to achieve reduced running time in the learning phase. The proposed architecture is evaluated on the basis of two standard machine learning tasks. Its performances are compared with those of recently proposed non-linear auto-associative neural networks. The overall results suggest that linear encoders can be profitably used to obtain sparse data representations in the context of machine learning problems, provided that an appropriate error function is used during the learning phase.
Image receptive fields for artificial neural networks
2014, Neurocomputing
Citation Excerpt :
More recent variants like LeCun׳s Convolutional Neural Networks [19] or Hinton׳s Deep Learning architecture [20] obtained remarkable results for some applications like automatic classification of manuscript numbers or characters. Several research teams develop these networks with success in challenging benchmarks, e.g. Cardoso and Wichert [21], Cireşan et al. [22], and Krizhevsky et al. [23]. These networks are relatively large.
This paper describes the structure of the Image Receptive Fields Neural Network (IRF-NN) introduced recently by our team. This structure extends simplified learning introduced by Extreme Learning Machine and Reservoir Computing techniques to the field of images.
Neurons are organized in a single hidden layer feedforward network architecture with an original organization of the network׳s input weights. To represent color images efficiently, without prior feature extraction, the weight values linked to a neuron are determined by a 2-D Gaussian function. The activation of a neuron by an image presents the properties of a nonlinear localized receptive field, parameterized with a small number of degrees of freedom.
A network composed of a large number of neurons, each associated with a randomly initialized and constant receptive field, induces a remarkable representation of the images. Supervised training determines only the output weights of the network. It is therefore extremely fast, without retropropagation or iterations, adapted to large sets of images.
The network is easy to implement, presents excellent generalization performances for classification applications, and allows the detection of unknown inputs. The efficiency of this technique is illustrated with several benchmarks, photo and video datasets.

View all citing articles on Scopus

Andreas Wichert studied computer science at the University of Saarland, where he graduated in 1993. Afterwards, he became a PhD student at the Department of Neural Information Processing, University of Ulm. He received a PhD in computer science in 2000. He has since worked in the field of fMRI as a researcher with an interdisciplinary group, Department of Psychiatry III Ulm, changing to F&K Delvotec bonding machines where he led the development of a diagnostic expert system. From 2004 to 2005 he was the scientific director of MITI Research Group Klinikum rechts der Isar of the Technical University Munich. Recently he joined the Faculdade de Ciências da Universidade de Lisboa Departamento de Informática and Departamento de Informática, Universidade Técnica de Lisboa (DEI-IST).

View full text

LettersHandwritten digit recognition using biologically inspired features

Abstract

Introduction

Section snippets

Related work

Map transformation cascade

Experiments

Conclusion

Acknowledgments

Neural Networks

Curr. Biol.

Neural Networks

Neural Networks

Neurocomputing

Pattern Recognition

Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat

J. Neurophysiol.

Neocognitrona self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position

Biol. Cybern.

Hierarchical models of object recognition in cortex

Nat. Neurosci.

A feedforward architecture accounts for rapid categorization

Proc. Natl. Acad. Sci. U.S.A.

Eye, Brain, and Vision

Uniformity of monkey striate cortexa parallel relationship between field size, scatter, and magnification factor

J. Comp. Neurol.

Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex

J. Neurophysiol.

Psychophysical support for a two-dimensional view interpolation theory of object recognition

Proc. Natl. Acad. Sci. U.S.A.

View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex

Cerebral Cortex

Sequence regularity and geometry of orientation columns in the monkey striate cortex

J. Comp. Neurol.

Lateral inhibition between orientation detectors in the cat's visual cortex

Exp. Brain Res.

Cognitrona self-organizing multilayered neural network

Biol. Cybern.

Computational Maps in the Visual Cortex

Gradient-based learning applied to document recognition

Proc. IEEE

Letters
Handwritten digit recognition using biologically inspired features