Elsevier

Pattern Recognition Letters

Volume 35, 1 January 2014, Pages 149-156
Pattern Recognition Letters

Handwriting word recognition using windowed Bernoulli HMMs

https://doi.org/10.1016/j.patrec.2012.09.002Get rights and content

Abstract

Hidden Markov Models (HMMs) are now widely used for off-line handwriting recognition in many languages. As in speech recognition, they are usually built from shared, embedded HMMs at symbol level, where state-conditional probability density functions in each HMM are modeled with Gaussian mixtures. In contrast to speech recognition, however, it is unclear which kind of features should be used and, indeed, very different features sets are in use today. Among them, we have recently proposed to directly use columns of raw, binary image pixels, which are directly fed into embedded Bernoulli (mixture) HMMs, that is, embedded HMMs in which the emission probabilities are modeled with Bernoulli mixtures. The idea is to by-pass feature extraction and to ensure that no discriminative information is filtered out during feature extraction, which in some sense is integrated into the recognition model. In this work, column bit vectors are extended by means of a sliding window of adequate width to better capture image context at each horizontal position of the word image. Using these windowed Bernoulli mixture HMMs, good results are reported on the well-known IAM and RIMES databases of Latin script, and in particular, state-of-the-art results are provided on the IfN/ENIT database of Arabic handwritten words.

Highlights

► Binarized handwritten text images are directly fed into Bernoulli HMMs. ► We extend conventional BHMMs by means of a sliding window with repositioning. ► Windowed BHMMs are tested on the IAM-words, RIMES and IfN/ENIT databases. ► Windowed BHMMs clearly outperform conventional BHMMs. ► We obtain state of the art results on IfN/ENIT and good results on IAM and RIMES.

Introduction

Hidden Markov Models (HMMs) are now widely used for off-line handwriting recognition in many languages and, in particular, in languages with Latin and Arabic scripts (Dehghan et al., 2001, Günter and Bunke, 2004, Märgner and El Abed, 2007, Märgner and El Abed, 2009, Grosicki and El Abed, 2009). Following the conventional approach in speech recognition (Rabiner and Juang, 1993), HMMs at global (line or word) level are built from shared, embedded HMMs at character (subword) level, which are usually simple in terms of number of states and topology. In the common case of real-valued feature vectors, state-conditional probability (density) functions are modeled as Gaussian mixtures since, as with finite mixture models in general, their complexity can be easily adjusted to the available training data by simply varying the number of components.

After decades of research in speech recognition, the use of certain real-valued speech features and embedded Gaussian (mixture) HMMs is a de-facto standard (Rabiner and Juang, 1993). However, in the case of handwriting recognition, there is no such standard and, indeed, very different sets of features are in use today. In Giménez and Juan, 2009 we proposed to by-pass feature extraction and to directly feed columns of raw, binary pixels into embedded Bernoulli (mixture) HMMs (BHMMs), that is, embedded HMMs in which the emission probabilities are modeled with Bernoulli mixtures. The basic idea is to ensure that no discriminative information is filtered out during feature extraction, which in some sense is integrated into the recognition model. In (Giménez et al., 2010), we improved our basic approach by using a sliding window of adequate width to better capture image context at each horizontal position of the text image. This improvement, to which we refer as windowed BHMMs, achieved very competitive results on the well-known IfN/ENIT database of Arabic town names (Pechwitz et al., 2002).

Although windowed BHMMs achieved good results on IfN/ENIT, it was clear to us that text distortions are more difficult to model with wide windows than with narrow (e.g. one-column) windows. In order to circumvent this difficulty, we have considered new, adaptative window sampling techniques, as opposed to the conventional, direct strategy by which the sampling window center is applied at a constant height of the text image and moved horizontally one pixel at a time. More precisely, these adaptative techniques can be seen as an application of the direct strategy followed by a repositioning step by which the sampling window is repositioned to align its center to the center of gravity of the sampled image. This repositioning step can be done horizontally, vertically or in both directions. Although vertical repositioning was expected to have more influence on recognition results than horizontal repositioning, we decided to study both separately, and also in conjunction, so as to confirm this expectation.

In this paper, the repositioning techniques described above are introduced and extensively tested on different, well-known databases for off-line handwriting recognition. In particular, we provide new, state-of-the-art results on the IfN/ENIT database, which clearly outperform our previous results without repositioning (Giménez et al., 2010). Indeed, the first tests on IfN/ENIT of our windowed BHMM system with vertical repositioning were made at the ICFHR 2010 Arabic handwriting recognition competition, where our system ranked first (Märgner and El Abed, 2010). Moreover, the test sets used in this competition were also used in a new competition at the ICDAR 2011 and none of the participants improved the results achieved by our system at the ICFHR 2010 conference (Märgner and El Abed, 2011). Apart from state-of-the-art results on IfN/ENIT, we also provide new empirical results on the IAM database of English words (Marti and Bunke, 2002) and the RIMES database of French words (Grosicki et al., 2009). Our windowed BHMM system with vertical repositioning achieves good results on both databases.

In what follows, we briefly review Bernoulli mixtures (Section 2), BHMMs (Section 3), maximum likelihood parameter estimation (Section 4) and windowed BHMMs repositioning techniques (Section 5). Empirical results are then reported in Section 6 and concluding remarks are given in Section 7.

Section snippets

Bernoulli mixture

Let o be a D-dimensional feature vector. A finite mixture is a probability (density) function of the form:P(o|Θ)=k=1KπkP(o|k,Θk),where K is the number of mixture components, πk is the kth component coefficient, and P(o|k,Θk) is the kth component-conditional probability (density) function. The mixture is controlled by a parameter vector Θ comprising the mixture coefficients and a parameter vector for the components, Θk. It can be seen as a generative model that first selects the kth component

Bernoulli HMM

Let O=(o1,,oT) be a sequence of feature vectors. An HMM is a probability (density) function of the form:P(O|Θ)=q1,,qTt=0Taqtqt+1t=1Tbqt(ot),where the sum is over all possible paths (state sequences) q0,,qT+1, such that q0=I (special initial or start state), qT+1=F (special final or stop state), and q1,c,qT{1,c,M}, being M the number of regular (non-special) states of the HMM. On the other hand, for any regular states i and j,aij denotes the transition probability from i to j, while bj

Maximum likelihood parameter estimation

Maximum likelihood estimation (MLE) of the parameters governing an embedded BHMM does not differ significantly from the conventional Gaussian case, and it is also efficiently performed using the well-known EM (Baum–Welch) re-estimation formulae (Rabiner and Juang, 1993, Young et al., 1995). Let (O1,S1),,(ON,SN), be a collection of N training samples in which the nth observation has length Tn, On=(on1,,onTn), which corresponds to a sequence of Ln symbols (LnTn), Sn=(sn1,,snLn). At iteration r

Windowed BHMMs

Given a binary image normalized in height to H pixels, we may think of a feature vector ot as its column at position t or, more generally, as a concatenation of columns in a window of W columns in width, centered at position t. This generalization has no effect neither on the definition of BHMM nor on its MLE, although it would be very helpful to better capture the image context at each horizontal position of the image. As an example, Fig. 2 shows a binary image of 4 columns and 5 rows, which

Experiments

Our windowed BHMMs and the repositioning techniques described above were tested on three well-known databases of handwritten words: the IfN/ENIT database (Pechwitz et al., 2002), IAM words (Marti and Bunke, 2002) and RIMES (Grosicki et al., 2009). In what follows, we describe experiments and results in each database separately.

Concluding remarks

Windowed Bernoulli mixture HMMs (BHMMs) for handwriting word recognition have been described and improved by the introduction of window repositioning techniques. In particular, we have considered three techniques of window repositioning after window extraction: vertical, horizontal, and both. They only differ in the way in which extracted windows are shifted to align mass and window centers (only in the vertical direction, horizontally or in both directions). In this work, these repositioning

Acknowledgments

The work is supported by the EC (FEDER/FSE), the Spanish MICINN (MIPRCV “Consolider Ingenio 2010”, iTrans2 TIN2009-14511, MITTRAL TIN2009-14633-C03-01, erudito.com TSI-020110-2009-439, and a AECID 2010/11 grant).

References (19)

  • M. Dehghan et al.

    Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM

    Pattern Recognition

    (2001)
  • S. Günter et al.

    HMM-based handwritten word recognition: on the optimization of the number of states, training iterations and Gaussian components

    Pattern Recognition

    (2004)
  • A.L. Bianne-Bernard et al.

    Dynamic and contextual information in HMM modeling for handwritten word recognition

    IEEE Trans. Pattern Anal. Machine Intell.

    (2011)
  • Dreuw, P., Heigold, G., Ney, H., 2009. Confidence-based discriminative training for model adaptation in offline Arabic...
  • Giménez, A., Juan, A., 2009. Embedded Bernoulli mixture HMMs for handwritten word recognition. In: ICDAR’09, Barcelona,...
  • Giménez, A., Khoury, I., Juan, A., 2010. Windowed Bernoulli mixture HMMs for Arabic handwritten word recognition. In:...
  • Grosicki, E., El Abed, H., 2009. ICDAR 2009 handwriting recognition competition. In: ICDAR’09, Barcelona, Spain, pp....
  • Grosicki, E., El Abed, H., 2011. ICDAR 2011 – French handwriting recognition competition. In: ICDAR’11, Beijing, China,...
  • Grosicki, E., Carré, M., Brodin, J.M., Geoffrois, E., 2009. Results of the RIMES evaluation campaign for handwritten...
There are more references available in the full text version of this article.

Cited by (32)

  • Multichannel dynamic modeling of non-Gaussian mixtures

    2019, Pattern Recognition
    Citation Excerpt :

    Some examples where non-linearity in the probability has been considered are: action recognition via sparse Gaussian processes [2]; modeling growth dynamics using unscented Kalman filters [3]; an extended Kalman filter augmented with local searches [4]; and modeling the data using a two-step method with fuzzy clustering and Gaussian mixture models (GMM) [5]. Some particular non-Gaussian conditional probabilities have been proposed in HMMs in applications such as handwritten word recognition [6] and biological sequence analysis [7]. A general extension of GMM to non-Gaussian mixtures is based on the concept of independent component analyzers (ICA).

  • Synchronous Multi-Stream Hidden Markov Model for offline Arabic handwriting recognition without explicit segmentation

    2016, Neurocomputing
    Citation Excerpt :

    In fact, Laurrence et al. [4] proved that the SVM classifier are less robust to degradation than the HMM classifier in case of highly broken characters. Thus, to model and absorb the distortion and the high variability of handwriting, a lot of works have been developed, based especially on HMMs that have become an effective paradigm for modelling stochastic processes and pattern sequences [7–16]. The standard HMM is a statistical model in which the modelled system is supposed to be a Markov process with unknown parameters, and the problem is to determine the hidden parameters from the observable ones [9].

  • Convolutional Arabic Handwriting Recognition System Based BLSTM-CTC Using WBS Decoder

    2024, International Journal of Intelligent Systems and Applications in Engineering
View all citing articles on Scopus
View full text