Classifier Ensembles(I)

Weighted Bagging for Graph Based One-Class Classifiers

Most conventional learning algorithms require both positive and negative training data for achieving accurate classification results. However, the problem of learning classifiers from only positive data arises in many applications where negative data are too costly, difficult to obtain, or not available at all. Minimum Spanning Tree Class Descriptor (MST_CD) was presented as a method that achieves better accuracies than other one-class classifiers in high dimensional data. However, the presence of outliers in the target class severely harms the performance of this classifier. In this paper we propose two bagging strategies for MST_CD that reduce the influence of outliers in training data. We show the improved performance on both real and artificially contaminated data.

Santi Seguí, Laura Igual, Jordi Vitrià

Improving Multilabel Classification Performance by Using Ensemble of Multi-label Classifiers

Multilabel classification is a challenging research problem in which each instance is assigned to a subset of labels. Recently, a considerable amount of research has been concerned with the development of “good” multi-label learning methods. Despite the extensive research effort, many scientific challenges posed by e.g. highly imbalanced training sets and correlation among labels remain to be addressed. The aim of this paper is use heterogeneous ensemble of multi-label learners to simultaneously tackle both imbalance and correlation problems. This is different from the existing work in the sense that the later mainly focuses on ensemble techniques within a multi-label learner while we are proposing in this paper to combine these state-of-the-art multi-label methods by ensemble techniques. The proposed ensemble approach (EML) is applied to three publicly available multi-label data sets using several evaluation criteria. We validate the advocated approach experimentally and demonstrate that it yields significant performance gains when compared with state-of-the art multi-label methods.

Muhammad Atif Tahir, Josef Kittler, Krystian Mikolajczyk, Fei Yan

New Feature Splitting Criteria for Co-training Using Genetic Algorithm Optimization

Often in real world applications only a small number of labeled data is available while unlabeled data is abundant. Therefore, it is important to make use of unlabeled data. Co-training is a popular semi-supervised learning technique that uses a small set of labeled data and enough unlabeled data to create more accurate classification models. A key feature for successful co-training is to split the features among more than one view. In this paper we propose new splitting criteria based on the confidence of the views, the diversity of the views, and compare them to random and natural splits. We also examine a previously proposed artificial split that maximizes the independence between the views, and propose a mixed criterion for splitting features based on both the confidence and the independence of the views. Genetic algorithms are used to choose the splits which optimize the independence of the views given the class, the confidence of the views in their predictions, and the diversity of the views. We demonstrate that our proposed splitting criteria improve the performance of co-training.

Ahmed Salaheldin, Neamat El Gayar

Incremental Learning of New Classes in Unbalanced Datasets: Learn + + .UDNC

We have previously described an incremental learning algorithm, Learn

+ +

.NC, for learning from new datasets that may include new concept classes without accessing previously seen data. We now propose an extension, Learn

+ +

.UDNC, that allows the algorithm to incrementally learn new concept classes from unbalanced datasets. We describe the algorithm in detail, and provide some experimental results on two separate representative scenarios (on synthetic as well as real world data) along with comparisons to other approaches for incremental and/or unbalanced dataset approaches.

Gregory Ditzler, Michael D. Muhlbaier, Robi Polikar

Tomographic Considerations in Ensemble Bias/Variance Decomposition

Classifier decision fusion has been shown to act in a manner analogous to the back-projection of Radon transformations when individual classifier feature sets are non or partially overlapping. It is possible, via this analogy, to demonstrate that standard linear classifier fusion introduces a morphological bias into the decision space due to the implicit angular undersampling of the feature selection process. In standard image-based (eg medical) tomography, removal of this bias involves a filtration process, and an analogous n-dimensional processes can be shown to exist for decision fusion using Högbom deconvolution.

Countering the biasing process implicit in linear fusion, however, is the fact that back projection of Radon transformation (being additive) should act to reduce variance within the composite decision space. In principle, this additive variance-reduction should still apply to tomographically- filtered back-projection, unless the filtration process contravenes.

We therefore argue that when feature selection is carried-out independently for each classifier (as in e.g. multi-modal problems) unfiltered decision fusion, while in general being variance-decreasing, is typically also bias-increasing. By employing a shot noise model, we seek to quantify how far filtration acts to rectify this problem, such that feature selection can be made

both

bias and variance reducing within an ensemble fusion context.

David Windridge

Choosing Parameters for Random Subspace Ensembles for fMRI Classification

Functional magnetic resonance imaging (fMRI) is a non-invasive and powerful method for analysis of the operational mechanisms of the brain. fMRI classification poses a severe challenge because of the extremely large feature-to-instance ratio. Random Subspace ensembles (RS) have been found to work well for such data. To enable a theoretical analysis of RS ensembles, we assume that only a small (known) proportion of the features are important to the classification, and the remaining features are noise. Three properties of RS ensembles are defined: usability, coverage and feature-set diversity. Their expected values are derived for a range of RS ensemble sizes (

L

) and cardinalities of the sampled feature subsets (

M

). Our hypothesis that larger values of the three properties are beneficial for RS ensembles was supported by a simulation study and an experiment with a real fMRI data set. The analyses suggested that RS ensembles benefit from medium

M

and relatively small

L

.

Ludmila I. Kuncheva, Catrin O. Plumpton

Classifier Ensembles(II)

An Experimental Study on Ensembles of Functional Trees

Functional Trees are one type of multivariate trees. This work studies the performance of different ensemble methods (Bagging, Random Subspaces, AdaBoost, Rotation Forest) using three variants (multivariate internal nodes, multivariate leaves or both) of these trees as base classifiers. The best results, for all the ensemble methods, are obtained using Functional Trees with multivariate leaves and univariate internal nodes. The best overall configuration is obtained with Rotation Forest. Ensembles of Functional Trees are compared to ensembles of univariate Decision Trees, being the results favourable for the variant of Functional Trees with univariate internal nodes and multivariate leaves. Kappa-error diagrams are used to study the diversity and accuracy of the base classifiers.

Juan J. Rodríguez, César García-Osorio, Jesús Maudes, José Francisco Díez-Pastor

Multiple Classifier Systems under Attack

In adversarial classification tasks like spam filtering, intrusion detection in computer networks and biometric authentication, a pattern recognition system must not only be accurate, but also

robust

to manipulations of input samples made by an adversary to mislead the system itself. It has been recently argued that the robustness of a classifier could be improved by avoiding to overemphasize or underemphasize input features on the basis of training data, since at operation phase the feature importance may change due to modifications introduced by the adversary. In this paper we empirically investigate whether the well known bagging and random subspace methods allow to improve the robustness of linear base classifiers by producing more uniform weight values. To this aim we use a method for performance evaluation of a classifier under attack that we are currently developing, and carry out experiments on a spam filtering task with several linear base classifiers.

Battista Biggio, Giorgio Fumera, Fabio Roli

SOCIAL: Self-Organizing ClassIfier ensemble for Adversarial Learning

Pattern recognition techniques are often used in environments (called

adversarial environments

) where adversaries can consciously act to limit or prevent accurate recognition performance. This can be obtained, for example, by changing labels of training data in a malicious way.

While Multiple Classifier Systems (MCS) are currently used in several security applications, like intrusion detection in computer networks and spam filtering, there are very few MCS proposals that explicitly address the problem of learning in adversarial environments. In this paper we propose a general algorithm based on a multiple classifier approach to find out and clean mislabeled training samples. We will report several experiments to verify the robustness of the proposed approach to the presence of possible mislabeled samples. In particular, we will show that the performance obtained with a simple classifier trained on the training set “cleaned” by our algorithm is comparable and even better than those obtained by some state-of-the-art MCS trained on the original datasets.

Francesco Gargiulo, Carlo Sansone

Unsupervised Change-Detection in Retinal Images by a Multiple-Classifier Approach

The aim of this work is the development of an unsupervised method for the detection of the changes that occurred in multitemporal digital images of the fundus of the human retina, in terms of white and red spots. The images are acquired from the same patient at different times by a fundus camera. The proposed method is an unsupervised multiple classifier approach, based on a minimum-error thresholding technique. This technique is applied to separate the “change” and the “no-change” areas in a suitably defined difference image. In particular, the thresholding approach is applied to selected sub-images: the outputs of the different windows are combined with a majority vote approach, in order to cope with local illumination differences. A quantitative assessment of the change detection performances suggests that the proposed method is able to provide accurate change maps, although possibly affected by misregistration errors or calibration/acquisition artifacts. The comparison between the results obtained using the implemented multiple classifier approach and a standard one points out that the proposed algorithm provides an accurate detection of the temporal changes.

Giulia Troglio, Marina Alberti, Jón Atli Benediksson, Gabriele Moser, Sebastiano Bruno Serpico, Einar Stefánsson

A Double Pruning Algorithm for Classification Ensembles

This article introduces a double pruning algorithm that can be used to reduce the storage requirements, speed-up the classification process and improve the performance of parallel ensembles. A key element in the design of the algorithm is the estimation of the class label that the ensemble assigns to a given test instance by polling only a fraction of its classifiers. Instead of applying this form of dynamical (instance-based) pruning to the original ensemble, we propose to apply it to a subset of classifiers selected using standard ensemble pruning techniques. The pruned subensemble is built by first modifying the order in which classifiers are aggregated in the ensemble and then selecting the first classifiers in the ordered sequence. Experiments in benchmark problems illustrate the improvements that can be obtained with this technique. Specifically, using a bagging ensemble of 101 CART trees as a starting point, only the 21 trees of the pruned ordered ensemble need to be stored in memory. Depending on the classification task, on average, only 5 to 12 of these 21 classifiers are queried to compute the predictions. The generalization performance achieved by this double pruning algorithm is similar to pruned ordered bagging and significantly better than standard bagging.

Víctor Soto, Gonzalo Martínez-Muñoz, Daniel Hernández-Lobato, Alberto Suárez

Estimation of the Number of Clusters Using Multiple Clustering Validity Indices

One of the challenges in unsupervised machine learning is finding the number of clusters in a dataset. Clustering Validity Indices (CVI) are popular tools used to address this problem. A large number of CVIs have been proposed, and reports that compare different CVIs suggest that no single CVI can always outperform others. Following suggestions found in prior art, in this paper we formalize the concept of using multiple CVIs for cluster number estimation in the framework of multi-classifier fusion. Using a large number of datasets, we show that decision-level fusion of multiple CVIs can lead to significant gains in accuracy in estimating the number of clusters, in particular for high-dimensional datasets with large number of clusters.

Krzysztof Kryszczuk, Paul Hurley

Classifier Diversity

“Good” and “Bad” Diversity in Majority Vote Ensembles

Although diversity in classifier ensembles is desirable, its relationship with the ensemble accuracy is not straightforward. Here we derive a decomposition of the majority vote error into three terms: average individual accuracy, “good” diversity and “bad diversity”. The good diversity term is taken out of the individual error whereas the bad diversity term is added to it. We relate the two diversity terms to the majority vote limits defined previously (the patterns of success and failure). A simulation study demonstrates how the proposed decomposition can be used to gain insights about majority vote classifier ensembles.

Gavin Brown, Ludmila I. Kuncheva

Multi-information Ensemble Diversity

Understanding ensemble diversity is one of the most important fundamental issues in ensemble learning. Inspired by a recent work trying to explain ensemble diversity from the information theoretic perspective, in this paper we study the ensemble diversity from the view of

multi-information

. We show that from this view, the ensemble diversity can be decomposed over the component classifiers constituting the ensemble. Based on this formulation, an approximation is given for estimating the diversity in practice. Experimental results show that our formulation and approximation are promising.

Zhi-Hua Zhou, Nan Li

Classifier Selection

Dynamic Selection of Ensembles of Classifiers Using Contextual Information

In a multiple classifier system, dynamic selection (DS) has been used successfully to choose only the best subset of classifiers to recognize the test samples. Dos Santos et al’s approach (DSA) looks very promising in performing DS, since it presents a general solution for a wide range of classifiers. Aiming to improve the performance of DSA, we propose a context-based framework that exploits the internal sources of knowledge embedded in this method. Named

$\mbox{DSA}^{c}$

, the proposed approach takes advantage of the evidences provided by the base classifiers to define the best set of ensembles of classifiers to recognize each test samples, by means of contextual information provided by the validation set. In addition, we propose a switch mechanism to deal with tie-breaking and low-margin decisions. Experiments on two handwriting recognition problems have demonstrated that the proposed approach generally presents better results than DSA, showing the effectiveness of the proposed enhancements. In addition, we demonstrate that the proposed method can be used, without changing the parameters of the base classifiers, in an incremental learning (IL) scenario, suggesting that it is also a promising general IL approach. And the use of a filtering method shows that we can significantly reduce the complexity of

$\mbox{DSA}^{c}$

in the same IL scenario and even resulting in an increase in the final performance.

Paulo R. Cavalin, Robert Sabourin, Ching Y. Suen

Selecting Structural Base Classifiers for Graph-Based Multiple Classifier Systems

Selecting a set of good and diverse base classifiers is essential for building multiple classifier systems. However, almost all commonly used procedures for selecting such base classifiers cannot be directly applied to select structural base classifiers. The main reason is that structural data cannot be represented in a vector space.

For graph-based multiple classifier systems, only using subgraphs for building structural base classifiers has been considered so far. However, in theory, a full graph preserves more information than its subgraphs. Therefore, in this work, we propose a different procedure which can transform a labelled graph into a new set of unlabelled graphs and preserve all the linkages at the same time. By embedding the label information into edges, we can further ignore the labels. By assigning weights to the edges according to the labels of their linked nodes, the strengths of the connections are altered, but the topology of the graph as a whole is preserved.

Since it is very difficult to embed graphs into a vector space, graphs are usually classified based on pairwise graph distances. We adopt the dissimilarity representation and build the structural base classifiers based on labels in the dissimilarity space. By combining these structural base classifiers, we can solve the labelled graph classification problem with a multiple classifier system. The performance of using the subgraphs and full graphs to build multiple classifier systems is compared in a number of experiments.

Wan-Jui Lee, Robert P. W. Duin, Horst Bunke

Combining Multiple Kernels

A Support Kernel Machine for Supervised Selective Combining of Diverse Pattern-Recognition Modalities

The Support Kernel Machine (SKM) and the Relevance Kernel Machine (RKM) are two principles for selectively combining object-representation modalities of different kinds by means of incorporating supervised selectivity into the classical kernel-based SVM. The former principle consists in rigidly selecting a subset of presumably informative support kernels and excluding the others, whereas the latter one assigns positive weights to all of them. The RKM algorithm was fully elaborated in previous publications; however the previous algorithm implementing the SKM principle of selectivity supervision is applicable only to real-valued features. The present paper fills in this gap by harnessing the framework of subdifferential calculus for computationally solving the problem of constrained nondifferentiable convex optimization that occurs in the SKM training criterion applicable to arbitrary kernel-based modalities of object representation.

Alexander Tatarchuk, Eugene Urlov, Vadim Mottl, David Windridge

Combining Multiple Kernels by Augmenting the Kernel Matrix

In this paper we present a novel approach to combining multiple kernels where the kernels are computed from different information channels. In contrast to traditional methods that learn a linear combination of

n

kernels of size

m

×

m

, resulting in

m

coefficients in the trained classifier, we propose a method that can learn

n

×

m

coefficients. This allows to assign different importance to the information channel per example rather than per kernel. We analyse the proposed kernel combination in empirical feature space and provide its geometrical interpretation. We validate the approach on both UCI datasets and an object recognition dataset, and demonstrate that it leads to classification improvements.

Fei Yan, Krystian Mikolajczyk, Josef Kittler, Muhammad Atif Tahir

Boosting and Bootstrapping

Class-Separability Weighting and Bootstrapping in Error Correcting Output Code Ensembles

A method for applying weighted decoding to error-correcting output code ensembles of binary classifiers is presented. This method is sensitive to the target class in that a separate weight is computed for each base classifier and target class combination. Experiments on 11 UCI datasets show that the method tends to improve classification accuracy when using neural network or support vector machine base classifiers. It is further shown that weighted decoding combines well with the technique of bootstrapping to improve classification accuracy still further.

R. S. Smith, T. Windeatt

Boosted Geometry-Based Ensembles

Geometry-based ensembles is a newly proposed algorithm based on the concept of

characterizing boundary points

. These points are found from the geometry of the data set and belong to the optimal boundary between classes under a certain notion of robustness. The characterizing boundary points can be used to build a classifier. Based on these points, a set of locally robust linear classifiers is defined and assembled in an additive model to create a final decision rule. As a result a strong classifier able to compete with nowadays state-of-the-art classifiers is obtained. The main drawback of the original proposal comes from the fact that the complexity of the created model can be arbitrarily high and depends on the data set. Moreover, outliers and noise may increase this number. In this article, small complexity models with strong generalization capability are explored. Two incremental non-parametric additive building algorithms are considered: boosting and least squared residual fitting approaches. Moreover, the last method is extended to deal with incremental L2 penalized solutions (which implicitly combines the advantages of sparse models and smooth ones due to the complexity limit). The validation of the approach on the UCI database achieves very promising results assessing the validity of CBP based classifiers ensembles.

Oriol Pujol

Online Non-stationary Boosting

Oza’s Online Boosting algorithm provides a version of AdaBoost which can be trained in an online way for stationary problems. One perspective is that this enables the power of the boosting framework to be applied to datasets which are too large to fit into memory. The online boosting algorithm assumes the data distribution to be independent and identically distributed (i.i.d.) and therefore has no provision for concept drift. We present an algorithm called Online Non-Stationary Boosting (ONSBoost) that, like Online Boosting, uses a static ensemble size without generating new members each time new examples are presented, and also adapts to a changing data distribution. We evaluate the new algorithm against Online Boosting, using the STAGGER dataset and three challenging datasets derived from a learning problem inside a parallelising virtual machine. We find that the new algorithm provides equivalent performance on the STAGGER dataset and an improvement of up to 3% on the parallelisation datasets.

Adam Pocock, Paraskevas Yiapanis, Jeremy Singer, Mikel Luján, Gavin Brown

Handwriting Recognition

Combining Neural Networks to Improve Performance of Handwritten Keyword Spotting

Keyword spotting refers to the process of retrieving all instances of a given word in a document. It has received significant amounts of attention recently as an attractive alternative to full text transcription, and is particularly suited for tasks such as document searching and browsing. In the present paper we propose a combination of several keyword spotting systems for unconstrained handwritten text. The individual systems are based on a novel type of neural network. Due to their random initialization, a great variety in performance is observed among the neural networks. We demonstrate that by using a combination of several networks the best individual system can be outperformed.

Volkmar Frinken, Andreas Fischer, Horst Bunke

Combining Committee-Based Semi-supervised and Active Learning and Its Application to Handwritten Digits Recognition

Semi-supervised learning

reduces the cost of labeling the training data of a supervised learning algorithm through using unlabeled data together with labeled data to improve the performance.

Co-Training

is a popular

semi-supervised learning

algorithm, that requires multiple redundant and independent sets of features (views). In many real-world application domains, this requirement can not be satisfied. In this paper, a single-view variant of

Co-Training

,

CoBC

(Co-Training by Committee), is proposed, which requires an ensemble of diverse classifiers instead of the redundant and independent views. Then we introduce two new learning algorithms,

QBC-then-CoBC

and

QBC-with-CoBC

, which combines the merits of committee-based

semi-supervised learning

and committee-based

active learning

. An empirical study on handwritten digit recognition is conducted where the random subspace method (

RSM

) is used to create ensembles of diverse C4.5 decision trees. Experiments show that these two combinations outperform the other non committee-based ones.

Mohamed Farouk Abdel Hady, Friedhelm Schwenker

Using Diversity in Classifier Set Selection for Arabic Handwritten Recognition

The first observation concerning Arabian manuscript reveals the complexity of the task, especially for the used classifiers ensemble. One of the most important steps in the design of a multi-classifier system (MCS), is the its components choice (classifiers). This step is very important to the overall MCS performance since the combination of a set of identical classifiers will not outperform the individual members. To select the best classifier set from a pool of classifiers, the classifier diversity is the most important property to be considered. The aim of this paper is to study Arabic handwriting recognition using MCS optimization based on diversity measures. The first approach selects the best classifier subset from large classifiers set taking into account different diversity measures. The second one chooses among the classifier set the one with the best performance and adds it to the selected classifiers subset. The performance in our approach is calculated using three diversity measures based on correlation between errors. On two database sets using 9 different classifiers, we then test the effect of using the criterion to be optimized (diversity measures,), and fusion methods (voting, weighted voting and Behavior Knowledge Space). The experimental results presented are encouraging and open other perspectives in the classifiers selection field especially speaking for Arabic Handwritten word recognition.

Nabiha Azizi, Nadir Farah, Mokhtar Sellami, Abdel Ennaji

Applications

Forecast Combination Strategies for Handling Structural Breaks for Time Series Forecasting

Time-series forecasting is an important research and application area. Much effort has been devoted over the past decades to develop and improve the time series forecasting models based on statistical and machine learning techniques. Forecast combination is a well-established and well-tested approach for improving forecasting accuracy. Many time series may contain some structural breaks that may affect the performance of forecasting due to the varying nature of the dynamics with time. In this study we investigate the performance of using forecast combination in handling these breaks, and in mitigating the effects of discontinuities in time series.

Waleed M. Azmy, Amir F. Atiya, Hisham El-Shishiny

A Multiple Classifier System for Classification of LIDAR Remote Sensing Data Using Multi-class SVM

Rapid advances in remote sensing sensor technology have made it recently possible to collect new dense 3D data like Light Detection And Ranging (LIDAR). One of the challenging issues about LIDAR data is classification of these data for identification of different objects in urban area like building, road, and tree. Regarding to complexities of objects in urban area and disability of LIDAR data to collect the radiometric information of surface, traditional classifiers have low level of performance in classification of LIDAR data. Combining classifiers is an established concept that it used for improvement of classification results. In this paper we propose a classifier fusion system scheme based on Support Vector Machine (SVM) for classification of LIDAR data. Different SVMs are trained on the best different subset of features that are proper for object extraction in LIDAR data and chosen by RANSAC as feature selection method. In this article, two multiclass SVM methods known as one-against-one and one-against-all are investigated for classification of LIDAR data and then final decision is achieved by Majority Voting method. The results confirm that established method on LIDAR data has improved accuracy of classification. It is also demonstrated that one-against-all results better accuracy comparing to one-against-one although it is much more time consuming.

Farhad Samadzadegan, Behnaz Bigdeli, Pouria Ramzi

A Multi-Classifier System for Off-Line Signature Verification Based on Dissimilarity Representation

Although widely used to reduce error rates of difficult pattern recognition problems, multiple classifier systems are not in widespread use in off-line signature verification. In this paper, a two-stage off-line signature verification system based on dissimilarity representation is proposed. In the first stage, a set of discrete HMMs trained with different number of states and/or different codebook sizes is used to calculate similarity measures that populate new feature vectors. In the second stage, these vectors are employed to train a SVM (or an ensemble of SVMs) that provides the final classification. Experiments performed by using a real-world signature verification database (with random, simple and skilled forgeries) indicate that the proposed system can significantly reduce the overall error rates, when compared to a traditional feature-based system using HMMs. Moreover, the use of ensemble of SVMs in the second stage can reduce individual error rates in up to 10%.

Luana Batista, Eric Granger, Robert Sabourin

A Multi-objective Sequential Ensemble for Cluster Structure Analysis and Visualization and Application to Gene Expression

In the presence of huge high dimensional datasets, it is important to investigate and visualize the connectivity of patterns in huge arbitrary shaped clusters. While density or distance-relatedness based clustering algorithms are used to efficiently discover clusters of arbitrary shapes and densities, classical (yet less efficient) clustering algorithms can be used to analyze the internal cluster structure and visualize it. In this work, a sequential ensemble, that uses an efficient distance-relatedness based clustering, “Mitosis”, followed by the centre-based K-means algorithm, is proposed. K-means is used to segment the clusters obtained by Mitosis into a number of subclusters. The ensemble is used to reveal the gradual change of patterns when applied to gene expression sets.

Noha A. Yousri

Combining 2D and 3D Features to Classify Protein Mutants in HeLa Cells

The field of high-throughput applications in biomedicine is an always enlarging field. This kind of applications, providing a huge amount of data, requires necessarily semi-automated or fully automated analysis systems. Such systems are typically represented by classifiers capable of discerning from the different types of data obtained (i.e. classes). In this work we present a methodology to improve classification accuracy in the field of 3D confocal microscopy. A set of 3D cellular images (z-stacks) were taken, each depicting HeLa cells with different mutations of the UCE protein ([Mannose-6-Phosphate]

U

n

C

overing

E

nzyme). This dataset was classified to obtain the mutation class from the z-stacks. 3D and 2D features were extracted, and classifications were carried out with

cell by cell

and

z-stack by z-stack

approaches, with 2D or 3D features. Also, a classification approach that combines 2D and 3D features is proposed, which showed interesting improvements in the classification accuracy.

Carlo Sansone, Vincenzo Paduano, Michele Ceccarelli

An Experimental Comparison of Hierarchical Bayes and True Path Rule Ensembles for Protein Function Prediction

The computational genome-wide annotation of gene functions requires the prediction of hierarchically structured functional classes and can be formalized as a multiclass, multilabel, multipath hierarchical classification problem, characterized by very unbalanced classes. We recently proposed two hierarchical protein function prediction methods: the Hierarchical Bayes (

hbayes

) and True Path Rule (

tpr

) ensemble methods, both able to reconcile the prediction of component classifiers trained locally at each term of the ontology and to control the overall precision-recall trade-off. In this contribution, we focus on the experimental comparison of the

hbayes

and

tpr

hierarchical gene function prediction methods and their cost-sensitive variants, using the model organism

S. cerevisiae

and the FunCat taxonomy. The results show that cost-sensitive variants of these methods achieve comparable results, and significantly outperform both

flat

and their non cost-sensitive hierarchical counterparts.

Matteo Re, Giorgio Valentini

Recognizing Combinations of Facial Action Units with Different Intensity Using a Mixture of Hidden Markov Models and Neural Network

Facial Action Coding System consists of 44 action units (AUs) and more than 7000 combinations. Hidden Markov models (HMMs) classifier has been used successfully to recognize facial action units (AUs) and expressions due to its ability to deal with AU dynamics. However, a separate HMM is necessary for each single AU and each AU combination. Since combinations of AU numbering in thousands, a more efficient method will be needed. In this paper an accurate real-time sequence-based system for representation and recognition of facial AUs is presented. Our system has the following characteristics: 1) employing a mixture of HMMs and neural network, we develop a novel accurate classifier, which can deal with AU dynamics, recognize subtle changes, and it is also robust to intensity variations, 2) although we use an HMM for each single AU only, by employing a neural network we can recognize each single and combination AU, and 3) using both geometric and appearance-based features, and applying efficient dimension reduction techniques, our system is robust to illumination changes and it can represent the temporal information involved in formation of the facial expressions. Extensive experiments on Cohn-Kanade database show the superiority of the proposed method, in comparison with other classifiers.

Mahmoud Khademi, Mohammad Taghi Manzuri-Shalmani, Mohammad Hadi Kiapour, Ali Akbar Kiaei

Invited Papers

Some Thoughts at the Interface of Ensemble Methods and Feature Selection

We are in a very exciting time for Machine Learning. The field is making its first steps toward true industrial-strength technology, slowly transitioning from a disparate collection of techniques, to a mature science. Multiple Classifier Systems in particular, are showing repeated successes at the most competitive of levels: winning the Netflix challenge, forming the backbone of cutting edge real-time computer vision, and most recently steering Google’s interests in quantum algorithms. It is thus becoming more and more difficult to generate truly meaningful contributions with our research. In the context of multiple classifier systems, we must ask ourselves, “how can we generate new MCS research that is truly

meaningful

?”

Gavin Brown

Multiple Classifier Systems for the Recogonition of Human Emotions

Research in the area of human-computer interaction (HCI) increasingly addressed the aspect of integrating some type of emotional intelligence in the system. Such systems must be able to recognize, interprete and create emotions. Although, human emotions are expressed through different modalities such as speech, facial expressions, hand or body gestures, most of the research in affective computing has been done in unimodal emotion recognition. Basically, a multimodal approach to emotion recognition should be more accurate and robust against missing or noisy data. We consider multiple classifier systems in this study for the classification of facial expressions, and additionally present a prototype of an audio-visual laughter detection system. Finally, a novel implementation of a Java process engine for pattern recognition and information fusion is described.

Friedhelm Schwenker, Stefan Scherer, Miriam Schmidt, Martin Schels, Michael Glodek

Springer Professional

Inhaltsverzeichnis

Frontmatter