Feature selection algorithms for the generation of multiple classifier systems and their application to handwritten word recognition

https://doi.org/10.1016/j.patrec.2004.05.002Get rights and content

Abstract

The study of multiple classifier systems has become an area of intensive research in pattern recognition recently. Also in handwriting recognition, systems combining several classifiers have been investigated. In this paper new methods for the creation of classifier ensembles based on feature selection algorithms are introduced. Those new methods are evaluated and compared to existing approaches in the context of handwritten word recognition, using a hidden Markov model recognizer as basic classifier.

Introduction

The field of off-line handwriting recognition has been a topic of intensive research for many years. First only the recognition of isolated handwritten characters was investigated (Suen et al., 1992), but later whole words (Simon, 1992) were addressed. Most of the systems reported in the literature until today consider constrained recognition problems based on vocabularies from specific domains, e.g. postal addresses (Kaltenmeier et al., 1993; Kim et al., 1999) or amounts on bank checks (Impedovo et al., 1997). Free handwriting recognition, without domain specific constraints and large vocabularies, was addressed only recently in a few papers (Kim et al., 1999, Marti and Bunke, 2001). The recognition rate of such systems is still low, and there is a need to improve it.

The combination of multiple classifiers was shown to be suitable for improving the recognition performance in difficult classification problems (Kittler and Roli, 2000). Particularly in handwriting recognition, classifiers combination has often been applied. Examples are given in (Huang and Suen, 1993, Lee and Srihari, 1993, Lee and Gomes, 1997, Maruyama and Nakano, 2000, Xu et al., 1992).

Recently a number of classifier combination methods, called ensemble methods, have been proposed in the field of machine learning (Bauer and Kohavi, 1999, Opitz and Maclin, 1999). Given a single classifier, called the base classifier, a set of classifiers can be automatically generated by changing the training set (Breiman, 1996), the input features (Ho, 1998), the input data by injecting randomness, or the parameters and architecture of the classifier. A summary of such methods is given in (Dietterich, 2000).

In this paper we propose a new ensemble creation method that is based on the selection of particular subsets of features. The problem of feature selection has been a focus of research in pattern recognition for a long time (Jain and Zongker, 1997). In statistical pattern recognition (Jain et al., 2000) features are typically vectors of real numbers, whereas in structural pattern recognition (Fu, 1982) symbolic data structures, such as strings, trees, and graphs, are used. In this paper we follow the statistical approach.

In general, the choice of features is application dependent and often very difficult. A way to solve this problem is to start with a set of candidate features and either modify or reduce them. The modification of a set of features includes transformations of the feature space, for example, Principal Component Analysis (Jolliffe, 1986). There exist many ways to reduce a given set of features (Kittler et al., 2001), and they all need an objective function to validate the quality of the considered subset of features. Often a validation set is used on which the recognition rate is measured. With this objective function the selection of the best subset of features becomes a search problem. An optimal approach is to calculate the objective function for all possible subsets. But this approach is only feasible for a rather small number of features. Branch and Bound algorithms (Narendra and Fukunaga, 1977) use a priori knowledge to reduce the search space. However, often there is no such a priori knowledge available and a suboptimal approach, such as sequential forward/backward search (Devijver and Kittler, 1982), floating search (Pudil et al., 1994) or oscillating search (Somol and Pudil, 2000), is used. An ensemble method using feature selection was introduced in (Alkoot and Kittler, 2000). In this method, a classifier always selects the single feature that leads to the best recognition rate of the whole classifier ensemble. In contrast with the algorithm proposed in the current paper, a feature may only be used by one classifier and the selection of the features is not reversible, i.e. features that have been selected cannot be abandoned at a later time. In (Oza and Tumer, 2001) a method was presented which creates a classifier for each class and use the set of features which are the most able to discriminate this class from all other classes for this classifier.

The contribution of the present paper is twofold. First, to the knowledge of the authors, it is the first study that applies classifier ensemble creation methods using feature subsets in the field of handwriting recognition. Secondly, in the present paper we propose a new method for classifier ensemble creation based on feature subsets. The new algorithm is based on the idea of applying feature subset search algorithms that allow to add and remove individual features to find useful subsets of features on which individual classifiers are trained. Multiple search algorithms are incorporated to increase the diversity of the resulting classifiers. Four different objective functions for assessing the quality of a feature set are proposed. In a series of experiments the new classifier ensemble creation method is compared to single classifiers and multiple classifier systems generated by means of methods proposed before.

The rest of the paper is organized as follows. Section 2 provides a review of some classical ensemble methods. In Section 3 the basic idea underlying the new ensemble creation method is described. The following section introduces some refinements to the basic algorithm. In Section 5 a description of the feature selection algorithm used in the experiments is given. Section 6 contains information about the handwritten word recognizer, which is used as the base classifier in the experiments. Those experiments are then discussed in Section 7 and, finally, conclusions are drawn in Section 8.

Section snippets

Classical ensemble methods

In this section the classical ensemble methods used in the multiple classifier systems introduced in this paper are described. Those methods were adopted, some with minor modifications, from the literature. Each method takes a base classifier and a training set as input and returns a number of trained instances of the base classifier as a result. More information about ensemble methods may be found in (Bauer and Kohavi, 1999, Opitz and Maclin, 1999).

New ensemble creation method

One of the most successful ensemble creation algorithms is the random subspace method (Ho, 1998). As described in the last section, each individual classifier uses its own subset of the given features for both training and testing. The subsets contain elements randomly chosen out of the whole feature set, where the size of the feature subset is usually the same for each classifier in the ensemble. The main goal of the algorithm is not to select good subsets of features, but to create diverse

Enhancements to the proposed ensemble creation method

As mentioned before, a good ensemble consists of classifiers that are as diverse as possible. One way to create diversity among the classifiers produced by the algorithm of Table 1 is to use one of the objective functions fi introduced in the last section. Another way is to use different feature subset search algorithms. The aim of the ensemble creation method proposed in this paper is to create several classifiers which use good performing subsets of the original features. How the individual

Using the new ensemble creation method in conjunction with modified “plus 1–take away 1” algorithm

In the previous sections, the particular feature selection algorithm to be used for classifier ensemble generation was left unspecified. In this section we describe the selection algorithm actually adopted in our system. It is a modified version of the “plus 1–take away 1” algorithm (Pudil et al., 1994). The algorithm is changed so that a feature is only added or removed from the working feature set if the objective function is increased by that action. Like in the floating search method (Pudil

Handwritten word recognizer

The application considered in this paper is the off-line recognition of cursively handwritten words. As basic classifier an HMM-based recognizer is used. This recognizer is similar to the one described in (Marti and Bunke, 2001). We assume that each handwritten word input to the recognizer has been normalized with respect to slant, skew, baseline location and height (for details of the normalization procedures see (Marti and Bunke, 2001)). A sliding window of one pixel width is moved from left

Experiments

The new classifier ensemble generation method proposed in this paper was implemented and experimentally evaluated. For the experiments words from a part of the handwritten sentence database described in (Marti and Bunke, 2002, Zimmermann and Bunke, 2002) were used. The training set contains 8795, while validation and test set consist of 1066 words each. The vocabulary underlying the data set used in the experiments is 2296. A few sample words used in the experiments are shown in Appendix A. The

Conclusions

A new ensemble creation method for the construction of multiple classifier systems was introduced. The new method applies feature subset search algorithms in order to find different subsets of the given features. Each of these subsets is then used by a classifier. By selecting well performing feature subsets and not just random ones, as it is done in the Random Subspace Method (Ho, 1998), an improved performance of the ensemble is achieved. Four new objective functions were introduced, which

Acknowledgements

The research was supported by the Swiss National Science Foundation (No. 20-52087.97). The authors thank Dr. Urs-Victor Marti for providing the handwritten word recognizer and Matthias Zimmermann for the segmentation of a part of the IAM database. Additional funding was provided by the Swiss National Science Foundation NCCR program “Interactive Multimodal Information Management (IM)2” in the Individual Project “Scene Analysis”. A preliminary version of the paper appeared in (Günter and Bunke,

References (39)

  • Y. Freund

    Boosting a weak learning algorithm by majority

    Informat. Computat.

    (1995)
  • Y. Freund et al.

    A descision-theoretic generalisation of on-line learning and an application to boosting

    J. Comput. Syst. Sci.

    (1997)
  • P. Pudil et al.

    Floating search methods in feature selection

    Pattern Recognition Lett.

    (1994)
  • Alkoot, F., Kittler, J., 2000. Feature selection for an ensemble of classifiers, in: Proc. 4th World Multiconf....
  • E. Bauer et al.

    An empirical comparison of voting classification algorithms: Bagging, boosting, and variants

    Mach. Learning J.

    (1999)
  • L. Breiman

    Bagging predictors

    Mach. Learning

    (1996)
  • P.A. Devijver et al.

    Pattern Recognition: A Statistical Approach

    (1982)
  • Dietterich, T.G., 2000. Ensemble methods in machine learning. in: (Kittler and Roli, 2000). pp....
  • K. Fu

    Syntactic Pattern Recognition and Applications

    (1982)
  • Günter, S., Bunke, H., 2002. Creation of classifier ensembles for handwritten word recognition using feature selection...
  • T.K. Ho

    The random subspace method for constructing decision forests

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1998)
  • Huang, Y., Suen, C., 1993. Combination of multiple classifiers with measurement values, in: Proc. of 2nd Int. Conf....
  • Hull, D., 1993. Using statistical testing in the evaluation of retrieval experiments. in: Research and Development in...
  • A. Jain et al.

    Feature selection: Evaluation, application, and small sample performance

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1997)
  • A. Jain et al.

    Statistical pattern recognition: A review

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • I.T. Jolliffe

    Principal Component Analysis

    (1986)
  • Kaltenmeier, A., Caesar, T., Gloger, J., Mandler, E., 1993. Sophisticated topology of hidden Markov models for cursive...
  • G. Kim et al.

    Architecture for handwritten text recognition systems

  • Cited by (49)

    • Multi-view ensemble learning using multi-objective particle swarm optimization for high dimensional data classification

      2022, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      Performance base view construction:This method obtains the feature subset that has equivalent performance on the validation set. Günter and Bunke (2004) consider the diversity of the views and performance of the ensemble of subsets of the features. Tsymbal et al. (2005) introduced diversity as a fitness function and compare the many feature selection methods for the best feature subsets.

    • Progressive subspace ensemble learning

      2016, Pattern Recognition
      Citation Excerpt :

      Guo et al. [29] proposed a two-stage pedestrian detection algorithm using AdaBoost and SVM. There are also applications in breast tumor discrimination [30], protein structural class prediction [31], network intrusion detection [32], and handwritten digit recognition [33] as well. Daliri [54] applied an extreme learning machine-based ensemble classifier for breast tissue classification.

    • Binary segmentation algorithm for English cursive handwriting recognition

      2012, Pattern Recognition
      Citation Excerpt :

      Using a fixed size sliding window, density feature is extracted for HMM to perform recognition by calculating the likelihood of a word against lexicon. Similarly, methodologies presented in [13–17] also adapted the segmentation-free strategy. Sliding window and geometrical feature extraction are the base of the HMM module recognition.

    • Dynamic Adaboost learning with feature selection based on parallel genetic algorithm for image annotation

      2010, Knowledge-Based Systems
      Citation Excerpt :

      Assuming that all individual classifiers perform well, combination of such multiple classifiers should reduce overall classification error [7,25–28]. An important method to construct an ensemble classifier is the combination of the best weak classifiers based on different feature subsets [17,29–31]. Two of the most popular methods of ensemble learning are bagging and boosting [32,33].

    • Hand-Written Text Recognition Methods: Review Study

      2022, Revue d'Intelligence Artificielle
    View all citing articles on Scopus
    View full text