Top

2011 | Book

Read chapter Read first chapter

Ensembles in Machine Learning Applications

Editors: Oleg Okun, Giorgio Valentini, Matteo Re

Publisher: Springer Berlin Heidelberg

Book Series : Studies in Computational Intelligence

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This book contains the extended papers presented at the 3rd Workshop on Supervised and Unsupervised Ensemble Methods

and their Applications (SUEMA) that was held in conjunction with the European Conference on Machine Learning and

Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2010, Barcelona, Catalonia, Spain).

As its two predecessors, its main theme was ensembles of supervised and unsupervised algorithms – advanced machine

learning and data mining technique. Unlike a single classification or clustering algorithm, an ensemble is a group

of algorithms, each of which first independently solves the task at hand by assigning a class or cluster label

(voting) to instances in a dataset and after that all votes are combined together to produce the final class or

cluster membership. As a result, ensembles often outperform best single algorithms in many real-world problems.

This book consists of 14 chapters, each of which can be read independently of the others. In addition to two

previous SUEMA editions, also published by Springer, many chapters in the current book include pseudo code and/or

programming code of the algorithms described in them. This was done in order to facilitate ensemble adoption in

practice and to help to both researchers and engineers developing ensemble applications.

Frontmatter

Facial Action Unit Recognition Using Filtered Local Binary Pattern Features with Bootstrapped and Weighted ECOC Classifiers

Abstract

Within the context face expression classification using the facial action coding system (FACS), we address the problem of detecting facial action units (AUs). The method adopted is to train a single Error-Correcting Output Code (ECOC) multiclass classifier to estimate the probabilities that each one of several commonly occurring AU groups is present in the probe image. Platt scaling is used to calibrate the ECOC outputs to probabilities and appropriate sums of these probabilities are taken to obtain a separate probability for each AU individually. Feature extraction is performed by generating a large number of local binary pattern (LBP) features and then selecting from these using fast correlation-based filtering (FCBF). The bias and variance properties of the classifier are measured and we show that both these sources of error can be reduced by enhancing ECOC through the application of bootstrapping and class-separability weighting.

Raymond S. Smith, Terry Windeatt

On the Design of Low Redundancy Error-Correcting Output Codes

Abstract

The classification of large number of object categories is a challenging trend in the Pattern Recognition field. In the literature, this is often addressed using an ensemble of classifiers . In this scope, the Error-Correcting Output Codes framework has demonstrated to be a powerful tool for combining classifiers. However, most of the state-of-the-art ECOC approaches use a linear or exponential number of classifiers, making the discrimination of a large number of classes unfeasible. In this paper, we explore and propose a compact design of ECOC in terms of the number of classifiers. Evolutionary computation is used for tuning the parameters of the classifiers and looking for the best compact ECOC code configuration. The results over several public UCI data sets and different multi-class Computer Vision problems show that the proposed methodology obtains comparable (even better) results than the state-of-the-art ECOC methodologies with far less number of dichotomizers.

Miguel Ángel Bautista, Sergio Escalera, Xavier Baró, Oriol Pujol, Jordi Vitrià, Petia Radeva

Minimally-Sized Balanced Decomposition Schemes for Multi-class Classification

Abstract

Error-Correcting Output Coding (ECOC) is a well-known class of decomposition schemes for multi-class classification. It allows representing any multiclass classification problem as a set of binary classification problems. Due to code redundancy ECOC schemes can significantly improve generalization performance on multi-class classification problems. However, they can face a computational complexity problem when the number of classes is large.

In this paper we address the computational-complexity problem of the decomposition schemes. We study a particular class of minimally-sized ECOC decomposition schemes, namely the class of minimally-sized balanced decomposition schemes (MBDSs) [14].We show thatMBDSs do not face a computational-complexity problem for large number of classes. However we also show that MBDSs cannot correct the classification errors of the binary classifiers in MBDS ensembles. Therefore we propose voting with MBDS ensembles (VMBDSs).We show that the generalization performance of the VMBDSs ensembles improves with the number of MBDS classifiers. However this number can become large and thus the VMBDSs ensembles can have a computational-complexity problem as well. Fortunately our experiments show that VMBDSs are comparable with ECOC ensembles and can outperform one-against-all ensembles using only a small number of MBDS ensembles.

Evgueni N. Smirnov, Matthijs Moed, Georgi Nalbantov, Ida Sprinkhuizen-Kuyper

Bias-Variance Analysis of ECOC and Bagging Using Neural Nets

Abstract

One of the methods used to evaluate the performance of ensemble classifiers is bias and variance analysis. In this chapter, we analyse bootstrap aggregating (Bagging) and Error Correcting Output Coding (ECOC) ensembles using a biasvariance framework; and make comparisons with single classifiers, while having Neural Networks (NNs) as base classifiers. As the performance of the ensembles depends on the individual base classifiers, it is important to understand the overall trends when the parameters of the base classifiers – nodes and epochs for NNs –, are changed. We show experimentally on 5 artificial and 4 UCI MLR datasets that there are some clear trends in the analysis that should be taken into consideration while designing NN classifier systems.

Cemre Zor, Terry Windeatt, Berrin Yanikoglu

Fast-Ensembles of Minimum Redundancy Feature Selection

Abstract

Finding relevant subspaces in very high-dimensional data is a challenging task not only for microarray data. The selection of features is to enhance the classification performance, but on the other hand the feature selection must be stable, i.e., the set of features selected should not change when using different subsets of a population. ensemble methods have succeeded in the increase of stability and classification accuracy. However, their runtime prevents them from scaling up to real-world applications.We propose two methods which enhance correlation-based feature selection such that the stability of feature selection comes with little or even no extra runtime.We show the efficiency of the algorithms analytically and empirically on a wide range of datasets.

Benjamin Schowe, Katharina Morik

Hybrid Correlation and Causal Feature Selection for Ensemble Classifiers

Abstract

PC and TPDA algorithms are robust and well known prototype algorithms, incorporating constraint-based approaches for causal discovery. However, both algorithms cannot scale up to deal with high dimensional data, that is more than few hundred features. This chapter presents hybrid correlation and causal feature selection for ensemble classifiers to deal with this problem. Redundant features are removed by correlation-based feature selection and then irrelevant features are eliminated by causal feature selection. The number of eliminated features, accuracy, the area under the receiver operating characteristic curve (AUC) and false negative rate (FNR) of proposed algorithms are compared with correlation-based feature selection (FCBF and CFS) and causal based feature selection algorithms (PC, TPDA, GS, IAMB).

Rakkrit Duangsoithong, Terry Windeatt

Learning Markov Blankets for Continuous or Discrete Networks via Feature Selection

Abstract

Learning Markov Blankets is important for classification and regression, causal discovery, and Bayesian network learning. We present an argument that ensemble masking measures can provide an approximate Markov Blanket. Consequently, an ensemble feature selection method can be used to learnMarkov Blankets for either discrete or continuous networks (without linear, Gaussian assumptions). We use masking measures for redundancy and statistical inference for feature selection criteria. We compare our performance in the causal structure learning problem to a collection of common feature selection methods.We also compare to Bayesian local structure learning. These results can also be easily extended to other casual structure models such as undirected graphical models.

Houtao Deng, Saylisse Davila, George Runger, Eugene Tuv

Ensembles of Bayesian Network Classifiers Using Glaucoma Data and Expertise

Abstract

Bayesian Networks (BNs) are probabilistic graphical models that are popular in numerous fields. Here we propose these models to improve the classification of glaucoma, a major cause of blindness worldwide. We use visual field and retinal data to predict the early onset of glaucoma. In particular, the ability of BNs to deal with missing data allows us to select an optimal data-driven network by comparing supervised and semi-supervised models. An expertise-driven BN is also built by encoding expert knowledge in terms of relations between variables. In order to improve the overall performances for classification and to explore the relations between glaucomatous data and expert knowledge, the expertise-driven network is combined with the selected data-driven network using a BN-based approach. An accuracy-weighted combination of these networks is also compared to the other models. The best performances are obtained with the semi-supervised data-driven network. However, combining it with the expertise-driven network improves performance in many cases and leads to interesting insights about the datasets, networks and metrics.

Stefano Ceccon, David Garway-Heath, David Crabb, Allan Tucker

A Novel Ensemble Technique for Protein Subcellular Location Prediction

Abstract

In this chapter we present an ensemble classifier that performs multi-class classification by combining several kernel classifiers through Decision Direct Acyclic Graph (DDAG). Each base classifier, called K-TIPCAC, is mainly based on the projection of the given points on the Fisher subspace, estimated on the training data, by means of a novel technique. The proposed multiclass classifier is applied to the task of protein subcellular location prediction, which is one of the most difficult multiclass prediction problems in modern computational biology. Although many methods have been proposed in the literature to solve this problem all the existing approaches are affected by some limitations, so that the problem is still open. Experimental results clearly indicate that the proposed technique, called DDAG K-TIPCAC, performs equally, if not better, than state of the art ensemble methods aimed at multi-class classification of highly unbalanced data.

Alessandro Rozza, Gabriele Lombardi, Matteo Re, Elena Casiraghi, Giorgio Valentini, Paola Campadelli

Trading-Off Diversity and Accuracy for Optimal Ensemble Tree Selection in Random Forests

Abstract

We discuss an effective method for optimal ensemble tree selection in Random Forests by trading-off diversity and accuracy of the ensemble during the selection process. As the chances of overfitting increase dramatically with the size of the ensemble, we wrap cross-validation around the ensemble selection to maximize the amount of validation data considering, in turn, each fold as a validation fold to select the trees from. The aim is to increase performance by reducing the variance of the tree ensemble selection process. We demonstrate the effectiveness of our approach on several UCI and real-world data sets.

Haytham Elghazel, Alex Aussem, Florence Perraud

Random Oracles for Regression Ensembles

Abstract

This paper considers the use of Random Oracles in Ensembles for regression tasks. A Random Oracle model (Kuncheva and Rodríguez, 2007) consists of a pair of models and a fixed randomly created “oracle” (in the case of the Linear Random Oracle, it is a hyperplane that divides the dataset in two during training and, once the ensemble is trained, decides which model to use). They can be used as the base model for any ensemble method. Previously, they have been used for classification. Here, the use of Random Oracles for regression is studied using 61 datasets, Regression Trees as base models and several ensemble methods: Bagging , Random Subspaces, AdaBoost.R2 and Iterated Bagging. For all the considered methods and variants, ensembles with Random Oracles are better than the corresponding version without the Oracles.

Carlos Pardo, Juan J. Rodríguez, José F. Díez-Pastor, César García-Osorio

Embedding Random Projections in Regularized Gradient Boosting Machines

Abstract

Random Projections are a suitable technique for dimensionality reduction in Machine Learning. In this work, we propose a novel Boosting technique that is based on embedding Random Projections in a regularized gradient boosting ensemble. Random Projections are studied from different points of view: pure Random Projections, normalized and uniform binary. Furthermore, we study the effect to keep or change the dimensionality of the data space. Experimental results performed on synthetic and UCI datasets show that Boosting methods with embedded random data projections are competitive to AdaBoost and Regularized Boosting.

Pierluigi Casale, Oriol Pujol, Petia Radeva

An Improved Mixture of Experts Model: Divide and Conquer Using Random Prototypes

Abstract

The Mixture of Experts (ME) is one of the most popular ensemble methods used in Pattern Recognition and Machine Learning. This algorithm stochastically partitions the input space of a problem into a number of subspaces, experts becoming specialized on each subspace. To manage this process, theME uses an expert called gating network, which is trained together with the other experts. In this chapter, we propose a modified version of the ME algorithm which first partitions the original problem into centralized regions and then uses a simple distance-based gating function to specialize the expert networks. Each expert contributes to classify an input sample according to the distance between the input and a prototype embedded by the expert. The Hierarchical Mixture of Experts (HME) is a tree-structured architecture which can be considered a natural extension of the ME model. The training and testing strategies of the standard HME model are also modified, based on the same insight applied to standard ME. In both cases, the proposed approach does not require to train the gating networks, as they are implemented with simple distance-based rules. In so doing the overall time required for training a modifiedME/ HME system is considerably lower. Moreover, centralizing input subspaces and adopting a random strategy for selecting prototypes permits to increase at the same time individual accuracy and diversity of ME/HME modules, which in turn increases the accuracy of the overall ensemble. Experimental results on a binary toy problem and on selected datasets from the UCI machine learning repository show the robustness of the proposed methods compared to the standard ME/HME models.

Giuliano Armano, Nima Hatami

Three Data Partitioning Strategies for Building Local Classifiers

Abstract

Divide-and-conquer approach has been recognized in multiple classifier systems aiming to utilize local expertise of individual classifiers. In this study we experimentally investigate three strategies for building local classifiers that are based on different routines of sampling data for training. The first two strategies are based on clustering the training data and building an individual classifier for each cluster or a combination. The third strategy divides the training set based on a selected feature and trains a separate classifier for each subset. Experiments are carried out on simulated and real datasets. We report improvement in the final classification accuracy as a result of combining the three strategies.

Indrė Žliobaitė

Backmatter

Title: Ensembles in Machine Learning Applications
Editors: Oleg Okun
Giorgio Valentini
Matteo Re
Publisher: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-22910-7
Print ISBN: 978-3-642-22909-1
DOI: https://doi.org/10.1007/978-3-642-22910-7

Springer Professional

Ensembles in Machine Learning Applications

About this book

Table of Contents

Frontmatter

Facial Action Unit Recognition Using Filtered Local Binary Pattern Features with Bootstrapped and Weighted ECOC Classifiers

On the Design of Low Redundancy Error-Correcting Output Codes

Minimally-Sized Balanced Decomposition Schemes for Multi-class Classification

Bias-Variance Analysis of ECOC and Bagging Using Neural Nets

Fast-Ensembles of Minimum Redundancy Feature Selection

Hybrid Correlation and Causal Feature Selection for Ensemble Classifiers

Learning Markov Blankets for Continuous or Discrete Networks via Feature Selection

Ensembles of Bayesian Network Classifiers Using Glaucoma Data and Expertise

A Novel Ensemble Technique for Protein Subcellular Location Prediction

Trading-Off Diversity and Accuracy for Optimal Ensemble Tree Selection in Random Forests

Random Oracles for Regression Ensembles

Embedding Random Projections in Regularized Gradient Boosting Machines

An Improved Mixture of Experts Model: Divide and Conquer Using Random Prototypes

Three Data Partitioning Strategies for Building Local Classifiers

Backmatter

Premium Partner