Skip to main content

Über dieses Buch

This book constitutes thoroughly refereed revised selected papers from the First IAPR TC3 Workshop on Partially Supervised Learning, PSL 2011, held in Ulm, Germany, in September 2011. The 14 papers presented in this volume were carefully reviewed and selected for inclusion in the book, which also includes 3 invited talks. PSL 2011 dealt with methodological issues as well as real-world applications of PSL. The main methodological issues were: combination of supervised and unsupervised learning; diffusion learning; semi-supervised classification, regression, and clustering; learning with deep architectures; active learning; PSL with vague, fuzzy, or uncertain teaching signals; learning, or statistical pattern recognition; and PSL in cognitive systems. Applications of PSL included: image and signal processing; multi-modal information processing; sensor/information fusion; human computer interaction; data mining and Web mining; forensic anthropology; and bioinformatics.



Invited Talks

Unlabeled Data and Multiple Views

In many real-world applications there are usually abundant unlabeled data but the amount of labeled training examples are often limited, since labeling the data requires extensive human effort and expertise. Thus, exploiting unlabeled data to help improve the learning performance has attracted significant attention. Major techniques for this purpose include semi-supervised learning and active learning. These techniques were initially developed for data with a single view, that is, a single feature set; while recent studies showed that for multi-view data, semi-supervised learning and active learning can amazingly well. This article briefly reviews some recent advances of this thread of research.
Zhi-Hua Zhou

Online Semi-supervised Ensemble Updates for fMRI Data

Advances in Eelectroencephalography (EEG) and functional magnetic resonance imaging (fMRI) have opened up the possibility for real time data classification. A small amount of labelled training data is usually available, followed by a large stream of unlabelled data. Noise and possible concept drift pose a further challenge. A fixed pre-trained classifier may not always work. One solution is to update the classifier in real-time. Since true labels are not available, the classifier is updated using the predicted label, a method called naive labelling. We propose to use classifier ensembles in order to counteract the adverse effect of ‘run-away’ classifiers, associated with naive labelling. A new ensemble method for naive labelling is proposed. The label taken to update each member-classifier is the ensemble prediction. We use an fMRI dataset to demonstrate the advantage of the proposed method over the fixed classifier and the single classifier updated through naive labelling.
Catrin O. Plumpton

Studying Self- and Active-Training Methods for Multi-feature Set Emotion Recognition

Automatic emotion classification is a task that has been subject of study from very different approaches. Previous research proves that similar performance to humans can be achieved by adequate combination of modalities and features. Nevertheless, large amounts of training data seem necessary to reach a similar level of accurate automatic classification. The labelling of training, validation and test sets is generally a difficult and time consuming task that restricts the experiments. Therefore, in this work we aim at studying self and active training methods and their performance in the task of emotion classification from speech data to reduce annotation costs. The results are compared, using confusion matrices, with the human perception capabilities and supervised training experiments, yielding similar accuracies.
José Esparza, Stefan Scherer, Friedhelm Schwenker


Semi-supervised Linear Discriminant Analysis Using Moment Constraints

A semi-supervised version of Fisher’s linear discriminant analysis is presented. As opposed to virtually all other approaches to semi-supervision, no assumptions on the data distribution are made, apart from the ones explicitly or implicitly present in standard supervised learning. Our approach exploits the fact that the parameters that are to be estimated in linear discriminant analysis fulfill particular relations that link label-dependent with label-independent quantities. In this way, the later type of parameters, which can be estimated based on unlabeled data, impose constraints on the former and lead to a reduction in variability of the label dependent estimates. As a result, the performance of our semi-supervised linear discriminant is expected to improve over that of its supervised equal and typically does not deteriorate with increasing numbers of unlabeled data.
Marco Loog

Manifold-Regularized Minimax Probability Machine

In this paper we propose Manifold-Regularized Minimax Probability Machine, called MRMPM. We show that Minimax Probability Machine can properly be extended to semi-supervised version in the manifold regularization framework and that its kernelized version is obtained for non-linear case. Our experiments show that the proposed methods achieve results competitive to existing learning methods, such as Laplacian Support Vector Machine and Laplacian Regularized Least Square for publicly available datasets from UCI machine learning repository.
Kazuki Yoshiyama, Akito Sakurai

Supervised and Unsupervised Co-training of Adaptive Activation Functions in Neural Nets

In spite of the nice theoretical properties of mixtures of logistic activation functions, standard feedforward neural network with limited resources and gradient-descent optimization of the connection weights may practically fail in several, difficult learning tasks. Such tasks would be better faced by relying on a more appropriate, problem-specific basis of activation functions. The paper introduces a connectionist model which features adaptive activation functions. Each hidden unit in the network is associated with a specific pair (f(·), p(·)), where f(·) (the very activation) is modeled via a specialized neural network, and p(·) is a probabilistic measure of the likelihood of the unit itself being relevant to the computation of the output over the current input. While f(·) is optimized in a supervised manner (through a novel backpropagation scheme of the target outputs which do not suffer from the traditional phenomenon of “vanishing gradient” that occurs in standard backpropagation), p(·) is realized via a statistical parametric model learned through unsupervised estimation. The overall machine is implicitly a co-trained coupled model, where the topology chosen for learning each f(·) may vary on a unit-by-unit basis, resulting in a highly non-standard neural architecture.
Ilaria Castelli, Edmondo Trentin

Semi-unsupervised Weighted Maximum-Likelihood Estimation of Joint Densities for the Co-training of Adaptive Activation Functions

The paper presents an explicit maximum-likelihood algorithm for the estimation of the probabilistic-weighting density functions that are associated with individual adaptive activation functions in neural networks. A partially unsupervised technique is devised which takes into account the joint distribution of input features and target outputs. Combined with the training algorithm introduced in the companion paper [2], the solution proposed herein realizes a well-defined, specific instance of the novel learning machine. The extension of the overall training method to more-than-one hidden layer architectures is pointed out, as well. A preliminary experimental demonstration is given, outlining how the algorithm works.
Ilaria Castelli, Edmondo Trentin

Semi-Supervised Kernel Clustering with Sample-to-Cluster Weights

Collecting unlabelled data is often effortless while labelling them can be difficult. Either the amount of data is too large or samples cannot be assigned a specific class label with certainty. In semi-supervised clustering the aim is to set the cluster centres close to their label-matching samples and unlabelled samples. Kernel based clustering methods are known to improve the cluster results by clustering in feature space. In this paper we propose a semi-supervised kernel based clustering algorithm that minimizes convergently an error function with sample-to-cluster weights. These sample-to-cluster weights are set dependent on the class label, i.e. matching, not-matching or unlabelled. The algorithm is able to use many kernel based clustering methods although we suggest Kernel Fuzzy C-Means, Relational Neural Gas and Kernel K-Means. We evaluate empirically the performance of this algorithm on two real-life dataset, namely Steel Plates Faults and MiniBooNE.
Stefan Faußer, Friedhelm Schwenker

Homeokinetic Reinforcement Learning

In order to find a control policy for an autonomous robot by reinforcement learning, the utility of a behaviour can be revealed locally through a modulation of the motor command by probing actions. For robots with many degrees of freedom, this type of exploration becomes inefficient such that it is an interesting option to use an auxiliary controller for the selection of promising probing actions. We suggest here to optimise the exploratory modulation by a self-organising controller. The approach is illustrated by two control tasks, namely swing-up of a pendulum and walking in a simulated hexapod. The results imply that the homeokinetic approach is beneficial for high complexity problems.
Simón C. Smith, J. Michael Herrmann

Iterative Refinement of HMM and HCRF for Sequence Classification

We propose a strategy for semi-supervised learning of Hidden-state Conditional Random Fields (HCRF) for signal classification. It builds on simple procedures for semi-supervised learning of Hidden Markov Models (HMM) and on strategies for learning a HCRF from a trained HMM system. The algorithm learns a generative system based on Hidden Markov models and a discriminative one based on HCRFs where each model is refined by the other in an iterative framework.
Yann Soullard, Thierry Artieres


On the Utility of Partially Labeled Data for Classification of Microarray Data

Microarrays are standard tools for measuring thousands of gene expression levels simultaneously. They are frequently used in the classification process of tumor tissues. In this setting a collected set of samples often consists only of a few dozen data points. Common approaches for classifying such data are supervised. They exclusively use categorized data for training a classification model. Restricted to a small number of samples, these algorithms are affected by overfitting and often lack a good generalization performance. An implicit assumption of supervised methods is that only labeled training samples exist. This assumption does not always hold. In medical studies often additional unlabeled samples are available that can not be categorized for some time (i.e., ”early relapse” vs. ”late relapse”). Alternative classification approaches, such as semi-supervised or transductive algorithms, are able to utilize this partially labeled data. Here, we empirically investigate five semi-supervised and transductive algorithms as ”early prediction tools” for incompletely labeled datasets of high dimensionality and low cardinality. Our experimental setup consists of cross-validation experiments under varying ratios of labeled to unlabeled examples. Most interestingly, the best cross-validation performance is not always achieved for completely labeled data, but rather for partially labeled datasets indicating the strong influence of label information on the classification process, even in the linearly separable case.
Ludwig Lausser, Florian Schmid, Hans A. Kestler

Multi-instance Methods for Partially Supervised Image Segmentation

In this paper, we propose a new partially supervised multi-class image segmentation algorithm. We focus on the multi-class, single-label setup, where each image is assigned one of multiple classes. We formulate the problem of image segmentation as a multi-instance task on a given set of overlapping candidate segments. Using these candidate segments, we solve the multi-instance, multi-class problem using multi-instance kernels with an SVM. This computationally advantageous approach, which requires only convex optimization, yields encouraging results on the challenging problem of partially supervised image segmentation.
Andreas Müller, Sven Behnke

Semi-supervised Training Set Adaption to Unknown Countries for Traffic Sign Classifiers

Traffic signs in Western European countries share many similarities but also can vary in colour, size, and depicted symbols. Statistical pattern classification methods are used for the automatic recognition of traffic signs in state-of-the-art driver assistance systems. Training a classifier separately for each country requires a huge amount of training data labelled by human annotators. In order to reduce these efforts, a self-learning approach extends the recognition capability of an initial German classifier to other European countries. After the most informative samples have been selected by the confidence band method from a given pool of unlabelled traffic signs, the classifier assigns labels to them. Furthermore, the performance of the self-learning classifier is improved by incorporating synthetically generated samples into the self-learning process. The achieved classification rates are comparable to those of classifiers trained with fully labelled samples.
Matthias Hillebrand, Christian Wöhler, Ulrich Kreßel, Franz Kummert

Comparison of Combined Probabilistic Connectionist Models in a Forensic Application

A growing interest toward automatic, computer-based tools has been spreading among forensic scientists and anthropologists wishing to extend the armamentarium of traditional statistical analysis and classification techniques. The combination of multiple paradigms is often required in order to fit the difficult, real-world scenarios involved in the area. The paper presents a comparison of combination techniques that exploit neural networks having a probabilistic interpretation within a Bayesian framework, either as models of class-posterior probabilities or as class-conditional density functions. Experiments are reported on a severe sex determination task relying on 1400 scout-view CT-scan images of human crania. It is shown that connectionist probability estimates yield higher accuracies than traditional statistical algorithms. Furthermore, the performance benefits from proper mixtures of neural models, and it turns up affected by the specific combination technique adopted.
Edmondo Trentin, Luca Lusnig, Fabio Cavalli

Classification of Emotional States in a Woz Scenario Exploiting Labeled and Unlabeled Bio-physiological Data

In this paper, a partially supervised machine learning approach is proposed for the recognition of emotional user states in HCI from bio-physiological data. To do so, an unsupervised learning preprocessing step is integrated into the training of a classifier. This makes it feasible to utilize unlabeled data or – as it is conducted in this study – data that is labeled in others than the considered categories. Thus, the data is transformed into a new representation and a standard classifier approach is subsequently applied. Experimental evidences that such an approach is beneficial in this particular setting is provided using classification experiments. Finally, the results are discussed and arguments when such an partially supervised approach is promising to yield robust and increased classification performances are given.
Martin Schels, Markus Kächele, David Hrabal, Steffen Walter, Harald C. Traue, Friedhelm Schwenker

Using Self Organizing Maps to Find Good Comparison Universities

Colleges and universities do not operate in a vacuum and they do not have a lock on “best practices”. As a result it is important to have other schools to use for “benchmark” comparisons. At the same time schools and their students change. What might have been good “benchmarks” in the past might not be appropriate in the future. This research demonstrates the viability of Self Organizing Maps (SOMs) as a means to find comparable institutions across many variables. An example of the approach shows which schools in the Council of Public Liberal Arts Colleges might be the best “benchmarks” for Fort Lewis College.
Cameron Cooper, Robert Kilmer

Sink Web Pages in Web Application

In this paper, we present the notion of sink web pages in a web application. These pages allow identifying a reduced scheme of the web application, which can lead to simplifying the method of testing and verifying the entire web application. We believe that this notion can be useful in the partially supervised learning.
Doru Anastasiu Popescu


Weitere Informationen

Premium Partner