Skip to main content



Invited Contributions

Discrete Component Analysis

This article presents a unified theory for analysis of components in discrete data, and compares the methods with techniques such as independent component analysis, non-negative matrix factorisation and latent Dirichlet allocation. The main families of algorithms discussed are a variational approximation, Gibbs sampling, and Rao-Blackwellised Gibbs sampling. Applications are presented for voting records from the United States Senate for 2003, and for the Reuters-21578 newswire collection.
Wray Buntine, Aleks Jakulin

Overview and Recent Advances in Partial Least Squares

Partial Least Squares (PLS) is a wide class of methods for modeling relations between sets of observed variables by means of latent variables. It comprises of regression and classification tasks as well as dimension reduction techniques and modeling tools. The underlying assumption of all PLS methods is that the observed data is generated by a system or process which is driven by a small number of latent (not directly observed or measured) variables. Projections of the observed data to its latent structure by means of PLS was developed by Herman Wold and coworkers [48,49,52].
Roman Rosipal, Nicole Krämer

Random Projection, Margins, Kernels, and Feature-Selection

Random projection is a simple technique that has had a number of applications in algorithm design. In the context of machine learning, it can provide insight into questions such as “why is a learning problem easier if data is separable by a large margin?” and “in what sense is choosing a kernel much like choosing a set of features?” This talk is intended to provide an introduction to random projection and to survey some simple learning algorithms and other applications to learning based on it. I will also discuss how, given a kernel as a black-box function, we can use various forms of random projection to extract an explicit small feature space that captures much of what the kernel is doing. This talk is based in large part on work in [BB05, BBV04] joint with Nina Balcan and Santosh Vempala.
Avrim Blum

Some Aspects of Latent Structure Analysis

Latent structure models involve real, potentially observable variables and latent, unobservable variables. The framework includes various particular types of model, such as factor analysis, latent class analysis, latent trait analysis, latent profile models, mixtures of factor analysers, state-space models and others. The simplest scenario, of a single discrete latent variable, includes finite mixture models, hidden Markov chain models and hidden Markov random field models. The paper gives a brief tutorial of the application of maximum likelihood and Bayesian approaches to the estimation of parameters within these models, emphasising especially the fact that computational complexity varies greatly among the different scenarios. In the case of a single discrete latent variable, the issue of assessing its cardinality is discussed. Techniques such as the EM algorithm, Markov chain Monte Carlo methods and variational approximations are mentioned.
D. M. Titterington

Feature Selection for Dimensionality Reduction

Dimensionality reduction is a commonly used step in machine learning, especially when dealing with a high dimensional space of features. The original feature space is mapped onto a new, reduced dimensionally space. The dimensionality reduction is usually performed either by selecting a subset of the original dimensions or/and by constructing new dimensions. This paper deals with feature subset selection for dimensionality reduction in machine learning. We provide a brief overview of the feature subset selection techniques that are commonly used in machine learning. Detailed description is provided for feature subset selection as commonly used on text data. For illustration, we show performance of several methods on document categorization of real-world data.
Dunja Mladenić

Contributed Papers

Auxiliary Variational Information Maximization for Dimensionality Reduction

Mutual Information (MI) is a long studied measure of information content, and many attempts to apply it to feature extraction and stochastic coding have been made. However, in general MI is computationally intractable to evaluate, and most previous studies redefine the criterion in forms of approximations. Recently we described properties of a simple lower bound on MI, and discussed its links to some of the popular dimensionality reduction techniques [1]. Here we introduce a richer family of auxiliary variational bounds on MI, which generalizes our previous approximations. Our specific focus then is on applying the bound to extracting informative lower-dimensional projections in the presence of irreducible Gaussian noise. We show that our method produces significantly tighter bounds than the well-known as-if Gaussian approximations of MI. We also show that the auxiliary variable method may help to significantly improve on reconstructions from noisy lower-dimensional projections.
Felix Agakov, David Barber

Constructing Visual Models with a Latent Space Approach

We propose the use of latent space models applied to local invariant features for object classification. We investigate whether using latent space models enables to learn patterns of visual co-occurrence and if the learned visual models improve performance when less labeled data are available. We present and discuss results that support these hypotheses. Probabilistic Latent Semantic Analysis (PLSA) automatically identifies aspects from the data with semantic meaning, producing unsupervised soft clustering. The resulting compact representation retains sufficient discriminative information for accurate object classification, and improves the classification accuracy through the use of unlabeled data when less labeled training data are available. We perform experiments on a 7-class object database containing 1776 images.
Florent Monay, Pedro Quelhas, Daniel Gatica-Perez, Jean-Marc Odobez

Is Feature Selection Still Necessary?

Feature selection is usually motivated by improved computational complexity, economy and problem understanding, but it can also improve classification accuracy in many cases. In this paper we investigate the relationship between the optimal number of features and the training set size. We present a new and simple analysis of the well-studied two-Gaussian setting. We explicitly find the optimal number of features as a function of the training set size for a few special cases and show that accuracy declines dramatically by adding too many features. Then we show empirically that Support Vector Machine (SVM), that was designed to work in the presence of a large number of features produces the same qualitative result for these examples. This suggests that good feature selection is still an important component in accurate classification.
Amir Navot, Ran Gilad-Bachrach, Yiftah Navot, Naftali Tishby

Class-Specific Subspace Discriminant Analysis for High-Dimensional Data

We propose a new method for discriminant analysis, called High Dimensional Discriminant Analysis (HDDA). Our approach is based on the assumption that high dimensional data live in different subspaces with low dimensionality. We therefore propose a new parameterization of the Gaussian model to classify high-dimensional data. This parameterization takes into account the specific subspace and the intrinsic dimension of each class to limit the number of parameters to estimate. HDDA is applied to recognize object parts in real images and its performance is compared to classical methods.
Charles Bouveyron, Stéphane Girard, Cordelia Schmid

Incorporating Constraints and Prior Knowledge into Factorization Algorithms – An Application to 3D Recovery

Matrix factorization is a fundamental building block in many computer vision and machine learning algorithms. In this work we focus on the problem of ”structure from motion” in which one wishes to recover the camera motion and the 3D coordinates of certain points given their 2D locations. This problem may be reduced to a low rank factorization problem. When all the 2D locations are known, singular value decomposition yields a least squares factorization of the measurements matrix. In realistic scenarios this assumption does not hold: some of the data is missing, the measurements have correlated noise, and the scene may contain multiple objects. Under these conditions, most existing factorization algorithms fail while human perception is relatively unchanged. In this work we present an EM algorithm for matrix factorization that takes advantage of prior information and imposes strict constraints on the resulting matrix factors. We present results on challenging sequences.
Amit Gruber, Yair Weiss

A Simple Feature Extraction for High Dimensional Image Representations

We investigate a method to find local clusters in low dimensional subspaces of high dimensional data, e.g. in high dimensional image descriptions. Using cluster centers instead of the full set of data will speed up the performance of learning algorithms for object recognition, and might also improve performance because overfitting is avoided. Using the Graz01 database, our method outperforms a current standard method for feature extraction from high dimensional image representations.
Christian Savu-Krohn, Peter Auer

Identifying Feature Relevance Using a Random Forest

It is known that feature selection and feature relevance can benefit the performance and interpretation of machine learning algorithms. Here we consider feature selection within a Random Forest framework. A feature selection technique is introduced that combines hypothesis testing with an approximation to the expected performance of an irrelevant feature during Random Forest construction.
It is demonstrated that the lack of implicit feature selection within Random Forest has an adverse effect on the accuracy and efficiency of the algorithm. It is also shown that irrelevant features can slow the rate of error convergence and a theoretical justification of this effect is given.
Jeremy Rogers, Steve Gunn

Generalization Bounds for Subspace Selection and Hyperbolic PCA

We present a method which uses example pairs of equal or unequal class labels to select a subspace with near optimal metric properties in a kernel-induced Hilbert space. A representation of finite dimensional projections as bounded linear functionals on a space of Hilbert-Schmidt operators leads to PAC-type performance guarantees for the resulting feature maps. The proposed algorithm returns the projection onto the span of the principal eigenvectors of an empirical operator constructed in terms of the example pairs. It can be applied to meta-learning environments and experiments demonstrate an effective transfer of knowledge between different but related learning tasks.
Andreas Maurer

Less Biased Measurement of Feature Selection Benefits

In feature selection, classification accuracy typically needs to be estimated in order to guide the search towards the useful subsets. It has earlier been shown [1] that such estimates should not be used directly to determine the optimal subset size, or the benefits due to choosing the optimal set. The reason is a phenomenon called overfitting, thanks to which these estimates tend to be biased. Previously, an outer loop of cross-validation has been suggested for fighting this problem. However, this paper points out that a straightforward implementation of such an approach still gives biased estimates for the increase in accuracy that could be obtained by selecting the best-performing subset. In addition, two methods are suggested that are able to circumvent this problem and give virtually unbiased results without adding almost any computational overhead.
Juha Reunanen


Weitere Informationen

Premium Partner