Invited Papers

Multiclassifier Systems: Back to the Future

While a variety of multiple classifier systems have been studied since at least the late 1950’s, this area came alive in the 90’s with significant theoretical advances as well as numerous successful practical applications. This article argues that our current understanding of ensemble-type multiclassifier systems is now quite mature and exhorts the reader to consider a broader set of models and situations for further progress. Some of these scenarios have already been considered in classical pattern recognition literature, but revisiting them often leads to new insights and progress. As an example, we consider how to integrate multiple clusterings, a problem central to several emerging distributed data mining applications. We also revisit output space decomposition to show how this can lead to extraction of valuable domain knowledge in addition to improved classification accuracy.

Joydeep Ghosh

Support Vector Machines, Kernel Logistic Regression and Boosting

The support vector machine is known for its excellent performance in binary classification, i.e., the response y ∈ −1, 1, but its appropriate extension to the multi-class case is still an on-going research issue. Another weakness of the SVM is that it only estimates sign[p(x) − 1/2], while the probability p(x) is often of interest itself, where p(x) = P(Y = 1∣X = x) is the conditional probability of a point being in class 1 given X = x. We propose a new approach for classification, called the import vector machine, which is built on kernel logistic regression (KLR). We show on some examples that the IVM performs as well as the SVM in binary classification. The IVM can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying class probabilities. Similar to the “support points” of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This can give the IVM a computational advantage over the SVM, especially when the size of the training data set is large. We illustrate these techniques on some examples, and make connections with boosting, another popular machine-learning method for classification.

Ji Zhu, Trevor Hastie

Multiple Classification Systems in the Context of Feature Extraction and Selection

Parallels between Feature Extraction / Selection and Multiple Classification Systems methodologies are considered. Both approaches allow the designer to introduce prior information about the pattern recognition task to be solved. However, both are heavily affected by computational difficulties and by the problem of small sample size / classifier complexity. Neither approach is capable of selecting a unique data analysis algorithm.

Šarūnas Raudys

Bagging and Boosting

Boosted Tree Ensembles for Solving Multiclass Problems

In this paper we consider the combination of two ensemble techniques, both capable of producing diverse binary base classifiers. Adaboost, a version of Boosting is combined with Output Coding for solving multiclass problems. Decision trees are chosen as the base classifiers, and the issue of tree pruning is addressed. Pruning produces less complex trees and sometimes leads to better generalisation. Experimental results demonstrate that pruning makes little difference in this framework. However, on average over nine benchmark datasets better accuracy is achieved by incorporating unpruned trees.

Terry Windeatt, Gholamreza Ardeshir

Distributed Pasting of Small Votes

Bagging and boosting are two popular ensemble methods that achieve better accuracy than a single classifier. These techniques have limitations on massive datasets, as the size of the dataset can be a bottleneck. Voting many classifiers built on small subsets of data (“pasting small votes”) is a promising approach for learning from massive datasets. Pasting small votes can utilize the power of boosting and bagging, and potentially scale up to massive datasets. We propose a framework for building hundreds or thousands of such classifiers on small subsets of data in a distributed environment. Experiments show this approach is fast, accurate, and scalable to massive datasets.

N. V. Chawla, L. O. Hall, K. W. Bowyer, T. E. Moore Jr., W. P. Kegelmeyer

Bagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy

In combining classifiers, it is believed that diverse ensembles perform better than non-diverse ones. In order to test this hypothesis, we study the accuracy and diversity of ensembles obtained in bagging and boosting applied to the nearest mean classifier. In our simulation study we consider two diversity measures: the Q statistic and the disagreement measure. The experiments, carried out on four data sets have shown that both diversity and the accuracy of the ensembles depend on the training sample size. With exception of very small training sample sizes, both bagging and boosting are more useful when ensembles consist of diverse classifiers. However, in boosting the relationship between diversity and the efficiency of ensembles is much stronger than in bagging.

Marina Skurichina, Liudmila I. Kuncheva, Robert P. W. Duin

Highlighting Hard Patterns via AdaBoost Weights Evolution

The dynamical evolution of weights in the AdaBoost algorithm contains useful information about the rôle that the associated data points play in the built of the AdaBoost model. In particular, the dynamics induces a bipartition of the data set into two (easy/hard) classes. Easy points are ininfluential in the making of the model, while the varying relevance of hard points can be gauged in terms of an entropy value associated to their evolution. Smooth approximations of entropy highlight regions where classification is most uncertain. Promising results are obtained when methods proposed are applied in the Optimal Sampling framework.

Bruno Caprile, Cesare Furlanello, Stefano Merler

Using Diversity with Three Variants of Boosting: Aggressive, Conservative, and Inverse

We look at three variants of the boosting algorithm called here Aggressive Boosting, Conservative Boosting and Inverse Boosting. We associate the diversity measure Q with the accuracy during the progressive development of the ensembles, in the hope of being able to detect the point of “paralysis” of the training, if any. Three data sets are used: the artificial Cone-Torus data and the UCI Pima Indian Diabetes data and the Phoneme data. We run each of the three Boosting variants with two base classifier models: the quadratic classifier and a multi-layer perceptron (MLP) neural network. The three variants show different behavior, favoring in most cases the Conservative Boosting.

Ludmila I. Kuncheva, Christopher J. Whitaker

Ensemble Learning and Neural Networks

Multistage Neural Network Ensembles

Neural network ensembles (some times referred to as committees or classifier ensembles) are effective techniques to improve the generalization of a neural network system. Combining a set of neural network classifiers whose error distributions are diverse can lead to generating more accurate results than any single network. Combination strategies commonly used in ensembles include simple averaging, weighted averaging, majority voting and ranking. However, each method has its limitations, dependent either on the application areas it is suited to, or due to its effectiveness. This paper proposes a new ensembles combination scheme called multistage neural network ensembles. Experimental investigations based on multistage neural network ensembles are presented, and the benefit of using this approach as an additional combination method in ensembles is demonstrated.

Shuang Yang, Antony Browne, Philip D. Picton

Forward and Backward Selection in Regression Hybrid Network

We introduce a Forward Backward and Model Selection algorithm (FBMS) for constructing a hybrid regression network of radial and perceptron hidden units. The algorithm determines whether a radial or a perceptron unit is required at a given region of input space. Given an error target, the algorithm also determines the number of hidden units. Then the algorithm uses model selection criteria and prunes unnecessary weights. This results in a final architecture which is often much smaller than a RBF network or a MLP. Results for various data sizes on the Pumadyn data indicate that the resulting architecture competes and often outperform best known results for this data set.

Shimon Cohen, Nathan Intrator

Types of Multinet System

A limiting factor in research on combining classifiers is a lack of awareness of the full range of available modular structures. One reason for this is that there is as yet little agreement on a means of describing and classifying types of multiple classifier system. In this paper, a categorisation scheme for the identification and description of types of multinet systems is proposed in which systems are described as (a) involving competitive or cooperative combination mechanisms; (b) combining either ensemble, modular, or hybrid components; (c) relying on either bottom-up, or top-down combination, and (d) when bottom up as using either static or fixed combination methods. It is claimed that the categorisation provides an early, but necessary, step in the process of mapping the space of multinet systems: permitting the comparison of different types of system, and facilitating their design and description. On the basis of this scheme, one ensemble and two modular multinet system designs are implemented, and applied to an engine fault diagnosis problem. The best generalisation performance was achieved from the ensemble system.

Amanda J. C. Sharkey

Discriminant Analysis and Factorial Multiple Splits in Recursive Partitioning for Data Mining

The framework of this paper is supervised statistical learning in data mining. In particular, multiple sets of inputs are used to predict an output on the basis of a training set. A typical data mining problem is to deal with large sets of within-groups correlated inputs compared to the number of observed objects. Standard tree-based procedures offer unstable and not interpretable solutions especially in case of complex relationships. For that multiple splits defined upon a suitable combination of inputs are required. This paper provides a methodology to build up a tree-based model which nodes splitting is due to factorial multiple splitting variables. A recursive partitioning algorithm is introduced considering a two-stage splitting criterion based on linear discriminant functions. As a result, an automated and fast procedure allows to look for factorial multiple splits able to capture suitable directions in the variability among the sets of inputs. Real world applications are discussed and the results of a simulation study are shown to describe fruitful properties of the proposed methodology.

Francesco Mola, Roberta Siciliano

Design Methodologies

New Measure of Classifier Dependency in Multiple Classifier Systems

Recent findings in the domain of combining classifiers provide a surprising revision of the usefulness of diversity for modelling combined performance. Although there is a common agreement that a successful fusion system should be composed of accurate and diverse classifiers, experimental results show very weak correlations between various diversity measures and combining methods. Effectively neither the combined performance nor its improvement against mean classifier performance seem to be measurable in a consistent and well defined manner. At the same time the most successful diversity measures, barely regarded as measuring diversity, are based on measuring error coincidences and by doing so they move closer to the definitions of combined errors themselves. Following this trend we decided to use directly the combining error normalized within the derivable error limits as a measure of classifiers dependency. Taking into account its simplicity and representativeness we chose majority voting error for the construction of the measure. We examine this novel dependency measure for a number of real datasets and classifiers showing its ability to model combining improvements over an individual mean.

Dymitr Ruta, Bogdan Gabrys

A Discussion on the Classifier Projection Space for Classifier Combining

In classifier combining, one tries to fuse the information that is given by a set of base classifiers. In such a process, one of the difficulties is how to deal with the variability between classifiers. Although various measures and many combining rules have been suggested in the past, the problem of constructing optimal combiners is still heavily studied.In this paper, we discuss and illustrate the possibilities of classifier embedding in order to analyse the variability of base classifiers, as well as their combining rules. Thereby, a space is constructed in which classifiers can be represented as points. Such a space of a low dimensionality is a Classifier Projection Space (CPS). In the first instance, it is used to design a visual tool that gives more insight into the differences of various combining techniques. This is illustrated by some examples. In the end, we discuss how the CPS may also be used as a basis for constructing new combining rules.

Elżbieta Pękalska, Robert P. W. Duin, Marina Skurichina

On the General Application of the Tomographic Classifier Fusion Methodology

We have previously (MCS2001) presented a mathematical metaphor setting out an equivalence between multiple expert fusion and the process of tomographic reconstruction familiar from medical imaging. However, the discussion took place only in relation to a restricted case: namely, classifiers containing discrete feature sets. This, its sequel paper, will therefore endeavour to extend the methodology to the fully general case.The investigation is thus conducted initially within the context of classical feature selection (that is, selection algorithms that place no restriction upon the overlap of feature sets), the findings in relation to which demonstrating the necessity of a re-evaluation of the role of feature-selection when conducted within an explicitly combinatorial framework. When fully enunciated, the resulting investigation leads naturally to a completely generalised, morphologically-optimal strategy for classifier combination.

D. Windridge, J. Kittler

Post-processing of Classifier Outputs in Multiple Classifier Systems

Incomparability in classifier outputs due to the variability in their scales is a major problem in the combination of different classification systems. In order to compensate this, output normalization is generally performed where the main aim is to transform the outputs onto the same scale. In this paper, it is proposed that in selecting the transformation function, the scale similarity goal should be accomplished with two more requirements. The first one is the separability of the pattern classes in the transformed output space and the second is the compatibility of the outputs with the combination rule. A method of transformation that provides improved satisfaction of the additional requirements is proposed which is shown to improve the classification performance of both linear and Bayesian combination systems based on the use of confusion matrix based a posteriori probabilities....

Hakan Altinçay, Mübeccel Demirekler

Combination Strategies

Trainable Multiple Classifier Schemes for Handwritten Character Recognition

In this paper we propose two novel multiple classifier fusion schemes which, although different in terms of architecture, share the idea of dynamically extracting additional statistical information about the individually trained participant classifiers by reinterpreting their outputs on a validation set. This is achieved through training on the resulting intermediate feature spaces of another classifier, be it a combiner or an intermediate stage classification device. We subsequently implemented our proposals as multi-classifier systems for handwritten character recognition and compare the performance obtained through a series of cross-validation experiments of increasing difficulty. Our findings strongly suggest that both schemes can successfully overcome the limitations imposed on fixed combination strategies from the requirement of comparable performance levels among their participant classifiers. In addition, the results presented demonstrate the significant gains achieved by our proposals in comparison with both individual classifiers experimentally optimized for the task in hand, and a multi-classifier system design process which incorporates artificial intelligence techniques.

K. Sirlantzis, S. Hoque, M. C. Fairhurst

Generating Classifier Ensembles from Multiple Prototypes and Its Application to Handwriting Recognition

There are many examples of classification problems in the literature where multiple classifier systems increase the performance over single classifiers. Normally one of the two following approaches is used to create a multiple classifier system. 1. Several classifiers are developed completely independent of each other and combined in a last step. 2. Several classifiers are created out of one base classifier by using so called classifier ensemble creation methods. In this paper algorithms which combine both approaches are introduced and they are experimentally evaluated in the context of an hidden Markov model (HMM) based handwritten word recognizer.

Simon Günter, Horst Bunke

Adaptive Feature Spaces for Land Cover Classification with Limited Ground Truth Data

Classification of hyperspectral data is challenging because of high dimensionality (O(100)) inputs, several possible output classes with uneven priors, and scarcity of labeled information. In an earlier work, a multiclassifier system arranged as a binary hierarchy was developed to group classes for easier, progressive discrimination [27]. This paper substantially expands the scope of such a system by integrating a feature reduction scheme that adaptively adjusts to the amount of labeled data available, while exploiting the highly correlated nature of certain adjacent hyperspectral bands. The resulting best-basis binary hierarchical classifier (BB-BHC) family is thus able to address the “small sample size” problem, as evidenced by our experimental results.

Joseph T. Morgan, Alex Henneguelle, Melba M. Crawford, Joydeep Ghosh, Amy Neuenschwander

Stacking with Multi-response Model Trees

We empirically evaluate several state-of-the-art methods for constructing ensembles of classifiers with stacking and show that they perform (at best) comparably to selecting the best classifier from the ensemble by cross validation. We then propose a new method for stacking, that uses multi-response model trees at the meta-level, and show that it outperforms existing stacking approaches, as well as selecting the best classifier from the ensemble by cross validation.

Sašo Džeroski, Bernard Ženko

On Combining One-Class Classifiers for Image Database Retrieval

In image retrieval systems, images can be represented by single feature vectors or by clouds of points. A cloud of points offers a more flexible description but suffers from class overlap. We propose a novel approach for describing clouds of points based on support vector data description (SVDD). We show that combining SVDD-based classifiers improves the retrieval precision. We investigate the performance of the proposed retrieval technique on a database of 368 texture images and compare it to other methods.

Carmen Lai, David M. J. Tax, Robert P. W. Duin, Elżbieta Pękalska, Pavel Paclík

Analysis and Performance Evaluation

Bias—Variance Analysis and Ensembles of SVM

Accuracy, diversity, and learning characteristics of base learners critically influence the effectiveness of ensemble methods. Bias-variance decomposition of the error can be used as a tool to gain insights into the behavior of learning algorithms, in order to properly design ensemble methods well-tuned to the properties of a specific base learner. In this work we analyse bias-variance decomposition of the error in Support Vector Machines (SVM), characterizing it with respect to the kernel and its parameters. We show that the bias-variance decomposition offers a rationale to develop ensemble methods using SVMs as base learners, and we outline two directions for developing SVM ensembles, exploiting the SVM bias characteristics and the bias-variance dependence on the kernel parameters.

Giorgio Valentini, Thomas G. Dietterich

An Experimental Comparison of Fixed and Trained Fusion Rules for Crisp Classifier Outputs

At present, fixed rules for classifier combination are the most used and widely investigated ones, while the study and application of trained rules has received much less attention. Therefore, pros and cons of fixed and trained rules are only partially known even if one focuses on crisp classifier outputs. In this paper, we report the results of an experimental comparison of well-known fixed and trained rules for crisp classifier outputs. Reported experiments allow one draw some preliminary conclusions about comparative advantages of fixed and trained fusion rules.

Fabio Roli, Šarūnas Raudys, Gian Luca Marcialis

Reduction of the Boasting Bias of Linear Experts

If no large design data set is available to design the Multiple classifier system, one typically uses the same data set to design both the expert classifiers and the fusion rule. In that case, the experts form an optimistically biased training data for a fusion rule designer. We consider standard Fisher linear and Euclidean distance classifiers used as experts and the single layer perceptron as a fusion rule. Original bias correction terms of experts’ answers are derived for these two types of expert classifiers under assumptions of high-variate Gaussian distributions. In addition, noise injection as a more universal technique is presented. Experiments with specially designed artificial Gaussian and real-world medical data showed that the theoretical bias correction works well in the case of high-variate artificial data and the noise injection technique is more preferable in the real-world problems.

Arūnas Janeliūnas, Šarūnas Raudys

Analysis of Linear and Order Statistics Combiners for Fusion of Imbalanced Classifiers

So far few theoretical works investigated the conditions under which specific fusion rules can work well, and a unifying framework for comparing rules of different complexity is clearly beyond the state of the art. A clear theoretical comparison is lacking even if one focuses on specific classes of combiners (e.g., linear combiners). In this paper, we theoretically compare simple and weighted averaging rules for fusion of imbalanced classifiers. Continuing the work reported in [10], we get a deeper knowledge of classifiers’ imbalance effects in linear combiners. In addition, we experimentally compare the performance of linear and order statistics combiners for ensembles with different degrees of classifiers imbalance.

Fabio Roli, Giorgio Fumera

Applications

Boosting and Classification of Electronic Nose Data

Boosting methods are known to improve generalization performances of learning algorithms reducing both bias and variance or enlarging the margin of the resulting multi-classifier system. In this contribution we applied Adaboost to the discrimination of different types of coffee using data produced with an Electronic Nose. Two groups of coffees (blends and monovarieties), consisting of seven classes each, have been analyzed. The boosted ensemble of Multi-Layer Perceptrons was able to halve the classification error for the blends data and to diminish it from 21% to 18% for the more difficult monovarieties data set.

Francesco Masulli, Matteo Pardo, Giorgio Sberveglieri, Giorgio Valentini

Content-Based Classification of Digital Photos

Annotating images with a description of the content can facilitate the organization, storage and retrieval of image databases. It can also be useful in processing images, by taking into account the scene depicted, in intelligent scanners, digital cameras, photocopiers, and printers. We present here our experimentation on indoor/outdoor/close-up content-based image classification. More specifically, we show that it is possible to relate low-level visual features to semantic photo categories, such as indoor, outdoor and close-up, using tree classifiers. We have designed and experimentally compared several classification strategies, producing a classifier that can provide a reasonably good performance on a generic photograph database.

R. Schettini, C. Brambilla, C. Cusano

Classifier Combination for In Vivo Magnetic Resonance Spectra of Brain Tumours

In this paper we present a multi-stage classifier for magnetic resonance spectra of human brain tumours which is being developed as part of a decision support system for radiologists. The basic idea is to decompose a complex classification scheme into a sequence of classifiers, each specialising in different classes of tumours and trying to reproduce part of the WHO classification hierarchy. Each stage uses a particular set of classification features, which are selected using a combination of classical statistical analysis, splitting performance and previous knowledge. Classifiers with different behaviour are combined using a simple voting scheme in order to extract different error patterns: LDA, decision trees and the k-NN classifier. A special label named “unknown” is used when the outcomes of the different classifiers disagree. Cascading is also used to incorporate class distances computed using LDA into decision trees. Both cascading and voting are effective tools to improve classification accuracy. Experiments also show that it is possible to extract useful information from the classification process itself in order to help users (clinicians and radiologists) to make more accurate predictions and reduce the number of possible classification mistakes.

Julià Minguillón, Anne Rosemary Tate, Carles Arús, John R. Griffiths

Combining Classifiers of Pesticides Toxicity through a Neuro-fuzzy Approach

The increasing amount and complexity of data in toxicity prediction calls for new approaches based on hybrid intelligent methods for mining the data. This focus is required even more in the context of increasing number of different classifiers applied in toxicity prediction. Consequently, there exist a need to develop tools to integrate various approaches. The goal of this research is to apply neuro-fuzzy networks to provide an improvement in combining the results of five classifiers applied in toxicity of pesticides. Nevertheless, fuzzy rules extracted from the trained developed networks can be used to perform useful comparisons between the performances of the involved classifiers. Our results suggest that the neuro-fuzzy approach of combining classifiers has the potential to significantly improve common classification methods for the use in toxicity of pesticides characterization, and knowledge discovery.

Emilio Benfenati, Paolo Mazzatorta, Daniel Neagu, Giuseppina Gini

A Multi-expert System for Movie Segmentation

In this paper we present a system for movie segmentation based on the automatic detection of dialogue scenes.The proposed system processes the video stream directly in the MPEG domain: it starts with the segmentation of the video footage in shots. Then, a characterization of each shot between dialogue and not-dialogue according to a Multi-Expert System (MES) is performed. Finally, the individuated sequences of shots are aggregated in dialogue scenes by means of a suitable algorithm. The MES integrates three experts, which classifies a given shot on the basis of very complementary descriptions; in particular an audio classifier, a face detector and a camera motion estimator have been built up and employed.The performance of the system have been tested on a huge MPEG movie database made up of more than 15000 shots and 200 scenes, giving rise to encouraging results.

L. P. Cordelia, M. De Santo, G. Percannella, C. Sansone, M. Vento

Decision Level Fusion of Intramodal Personal Identity Verification Experts

We investigate the Behavior Knowledge Space [4] and Decision Templates [7] methods of classifier fusion in the context of personal identity verification involving six intramodal experts exploiting frontal face biometrics.The results of extensive experiments on the XM2VTS database show the Behavioural Knowledge Space fusion strategy achieves consistently better results than the Decision Templates method. Most importantly, it exhibits quasi monotonic behaviour as the number of experts combined increases.

J. Kittler, M. Ballette, J. Czyz, F. Roli, L. Vandendorpe

An Experimental Comparison of Classifier Fusion Rules for Multimodal Personal Identity Verification Systems

In this paper, an experimental comparison between fixed and trained fusion rules for multimodal personal identity verification is reported. We focused on the behaviour of the considered fusion methods for ensembles of classifiers exhibiting significantly different performance, as this is one of the main characteristics of multimodal biometrics systems. The experiments were carried out on the XM2VTS database, using eight experts based on speech and face data. As fixed fusion methods, we considered the sum, majority voting, and order statistics based rules. The considered trained methods are the Behaviour Knowledge Space and the weighted averaging of classifiers outputs.

Fabio Roli, Josef Kittler, Giorgio Fumera, Daniele Muntoni

Springer Professional

Inhaltsverzeichnis

Frontmatter