Unsupervised Learning

Simple and Effective Connectionist Nonparametric Estimation of Probability Density Functions

Abstract

Estimation of probability density functions (pdf) is one major topic in pattern recognition. Parametric techniques rely on an arbitrary assumption on the form of the underlying, unknown distribution. Nonparametric techniques remove this assumption In particular, the Parzen Window (PW) relies on a combination of local window functions centered in the patterns of a training sample. Although effective, PW suffers from several limitations. Artificial neural networks (ANN) are, in principle, an alternative family of nonparametric models. ANNs are intensively used to estimate probabilities (e.g., class-posterior probabilities), but they have not been exploited so far to estimate pdfs. This paper introduces a simple neural-based algorithm for unsupervised, nonparametric estimation of pdfs, relying on PW. The approach overcomes the limitations of PW, possibly leading to improved pdf models. An experimental demonstration of the behavior of the algorithm w.r.t. PW is presented, using random samples drawn from a standard exponential pdf.

Edmondo Trentin

Comparison Between Two Spatio-Temporal Organization Maps for Speech Recognition

Abstract

In this paper, we compare two models biologically inspired and gathering spatio-temporal data coding, representation and processing. These models are based on Self-Organizing Map (SOM) yielding to a Spatio-Temporel Organization Map (STOM). More precisely, the map is trained using two different spatio-temporal algorithms taking their roots in biological researches: The ST-Kohonen and the Time-Organized Map (TOM). These algorithms use two kinds of spatio-temporal data coding. The first one is based on the domain of complex numbers, while the second is based on the ISI (Inter Spike Interval). STOM is experimented in the field of speech recognition in order to evaluate its performance for such time variable application and to prove that biological models are capable of giving good results as stochastic and hybrid ones.

Zouhour Neji Ben Salem, Laurent Bougrain, Frédéric Alexandre

Adaptive Feedback Inhibition Improves Pattern Discrimination Learning

Abstract

Neural network models for unsupervised pattern recognition learning are challenged when the difference between the patterns of the training set is small. The standard neural network architecture for pattern recognition learning consists of adaptive forward connections and lateral inhibition, which provides competition between output neurons. We propose an additional adaptive inhibitory feedback mechanism, to emphasize the difference between training patterns and improve learning. We present an implementation of adaptive feedback inhibition for spiking neural network models, based on spike timing dependent plasticity (STDP). When the inhibitory feedback connections are adjusted using an anti-Hebbian learning rule, feedback inhibition suppresses the redundant activity of input units which code the overlap between similar stimuli. We show, that learning speed and pattern discriminatability can be increased by adding this mechanism to the standard architecture.

Frank Michler, Thomas Wachtler, Reinhard Eckhorn

Semi-supervised Learning

Supervised Batch Neural Gas

Abstract

Recently, two extensions of neural gas have been proposed: a fast batch version of neural gas for data given in advance, and extensions of neural gas to learn a (possibly fuzzy) supervised classification. Here we propose a batch version for supervised neural gas training which allows to efficiently learn a prototype-based classification, provided training data are given beforehand. The method relies on a simpler cost function than online supervised neural gas and leads to simpler update formulas. We prove convergence of the algorithm in a general framework, which also incorporates supervised k-means and supervised batch-SOM, and which opens the way towards metric adaptation as well as application to proximity data not embedded in a real-vector space.

Barbara Hammer, Alexander Hasenfuss, Frank-Michael Schleif, Thomas Villmann

Fuzzy Labeled Self-Organizing Map with Label-Adjusted Prototypes

Abstract

We extend the self-organizing map (SOM) in the form as proposed by Heskes to a supervised fuzzy classification method. On the one hand, this leads to a robust classifier where efficient learning with fuzzy labeled or partially contradictory data is possible. On the other hand, the integration of labeling into the location of prototypes in a SOM leads to a visualization of those parts of the data relevant for the classification.

Thomas Villmann, Udo Seiffert, Frank-Michael Schleif, Cornelia Brüß, Tina Geweniger, Barbara Hammer

On the Effects of Constraints in Semi-supervised Hierarchical Clustering

Abstract

We explore the use of constraints with divisive hierarchical clustering. We mention some considerations on the effects of the inclusion of constraints into the hierarchical clustering process. Furthermore, we introduce an implementation of a semi-supervised divisive hierarchical clustering algorithm and show the influence of including constraints into the divisive hierarchical clustering process. In this task our main interest lies in building stable dendrograms when clustering with different subsets of data.

Hans A. Kestler, Johann M. Kraus, Günther Palm, Friedhelm Schwenker

A Study of the Robustness of KNN Classifiers Trained Using Soft Labels

Abstract

Supervised learning models most commonly use crisp labels for classifier training. Crisp labels fail to capture the data characteristics when overlapping classes exist. In this work we attempt to compare between learning using soft and hard labels to train K-nearest neighbor classifiers. We propose a new technique to generate soft labels based on fuzzy-clustering of the data and fuzzy relabelling of cluster prototypes. Experiments were conducted on five data sets to compare between classifiers that learn using different types of soft labels and classifiers that learn with crisp labels. Results reveal that learning with soft labels is more robust against label errors opposed to learning with crisp labels. The proposed technique to find soft labels from the data, was also found to lead to a more robust training in most data sets investigated.

Neamat El Gayar, Friedhelm Schwenker, Günther Palm

Supervised Learning

An Experimental Study on Training Radial Basis Functions by Gradient Descent

Abstract

In this paper, we present experiments comparing different training algorithms for Radial Basis Functions (RBF) neural networks. In particular we compare the classical training which consist of an unsupervised training of centers followed by a supervised training of the weights at the output, with the full supervised training by gradient descent proposed recently in same papers. We conclude that a fully supervised training performs generally better. We also compare Batch training with Online training and we conclude that Online training suppose a reduction in the number of iterations.

Joaquín Torres-Sospedra, Carlos Hernández-Espinosa, Mercedes Fernández-Redondo

A Local Tangent Space Alignment Based Transductive Classification Algorithm

Abstract

LTSA (local tangent space alignment) is a recently proposed method for manifold learning, which can efficiently learn nonlinear embedding low-dimensional coordinates of high-dimensional data, and can also reconstruct high dimensional coordinates from embedding coordinates. But it ignores the label information conveyed by data samples, and can not be used for classification directly. In this paper, a transductive manifold classification method, called QLAT (LDA/QR and LTSA based Transductive classifier) is presented, which is based on LTSA and TCM-KNN (transduction confidence machine-k nearest neighbor). In the algorithm, local low-dimensional coordinates is constructed using 2-stage LDA/QR method, which not only utilize the label information of sample data, but also conquer the singularity problem of traditional LDA, then the global low-dimensional embedding manifold is obtained by local affine transforms, finally TCM-KNN method is used for classification on the low-dimensional manifold. Experiments on labeled and unlabeled mixed data set illustrate the effectiveness of the method.

Jianwei Yin, Xiaoming Liu, Zhilin Feng, Jinxiang Dong

Incremental Manifold Learning Via Tangent Space Alignment

Abstract

Several algorithms have been proposed to analysis the structure of high-dimensional data based on the notion of manifold learning. They have been used to extract the intrinsic characteristic of different type of high-dimensional data by performing nonlinear dimensionality reduction. Most of them operate in a “batch” mode and cannot be efficiently applied when data are collected sequentially. In this paper, we proposed an incremental version (ILTSA) of LTSA (Local Tangent Space Alignment), which is one of the key manifold learning algorithms. Besides, a landmark version of LTSA (LLTSA) is proposed, where landmarks are selected based on LASSO regression, which is well known to favor sparse approximations because it uses regularization with l₁ norm. Furthermore, an incremental version (ILLTSA) of LLTSA is also proposed. Experimental results on synthetic data and real word data sets demonstrate the effectivity of our algorithms.

Xiaoming Liu, Jianwei Yin, Zhilin Feng, Jinxiang Dong

A Convolutional Neural Network Tolerant of Synaptic Faults for Low-Power Analog Hardware

Abstract

Recently, the authors described a training method for a convolutional neural network of threshold neurons. Hidden layers are trained by by clustering, in a feed-forward manner, while the output layer is trained using the supervised Perceptron rule. The system is designed for implementation on an existing low-power analog hardware architecture, exhibiting inherent error sources affecting the computation accuracy in unspecified ways. One key technique is to train the network on-chip, taking possible errors into account without any need to quantify them. For the hidden layers, an on-chip approach has been applied previously. In the present work, a chip-in-the-loop version of the iterative Perceptron rule is introduced for training the output layer. Influences of various types of errors are thoroughly investigated (noisy, deleted, and clamped weights) for all network layers, using the MNIST database of hand-written digits as a benchmark.

Johannes Fieres, Karlheinz Meier, Johannes Schemmel

Ammonium Estimation in a Biological Wastewater Plant Using Feedforward Neural Networks

Abstract

Mathematical models are normally used to calculate the component concentrations in biological wastewater treatment. However, this work deals with the wastewater from a coke plant and it implies inhibition effects between components which do not permit the use of said mathematical models. Due to this, feed-forward neural networks were used to estimate the ammonium concentration in the effluent stream of the biological plant. The architecture of the neural network is based on previous works in this topic. The methodology consists in performing a group of different sizes of the hidden layer and different subsets of input variables.

Hilario López García, Iván Machón González

Support Vector Learning

Support Vector Regression Using Mahalanobis Kernels

Abstract

In our previous work we have shown that Mahalanobis kernels are useful for support vector classifiers both from generalization ability and model selection speed. In this paper we propose using Mahalanobis kernels for function approximation. We determine the covariance matrix for the Mahalanobis kernel using all the training data. Model selection is done by line search. Namely, first the margin parameter and the error threshold are optimized and then the kernel parameter is optimized. According to the computer experiments for four benchmark problems, estimation performance of a Mahalanobis kernel with a diagonal covariance matrix optimized by line search is comparable to or better than that of an RBF kernel optimized by grid search.

Yuya Kamada, Shigeo Abe

Incremental Training of Support Vector Machines Using Truncated Hypercones

Abstract

We discuss incremental training of support vector machines in which we approximate the regions, where support vector candidates exist, by truncated hypercones. We generate the truncated surface with the center being the center of unbounded support vectors and with the radius being the maximum distance from the center to support vectors. We determine the hypercone surface so that it includes a datum, which is far away from the separating hyperplane. Then to cope with non-separable cases, we shift the truncated hypercone along the rotating axis in parallel in the opposite direction of the separating hyperplane. We delete the data that are in the truncated hypercone and keep the remaining data as support vector candidates. In computer experiments, we show that we can delete many data without deteriorating the generalization ability.

Shinya Katagiri, Shigeo Abe

Fast Training of Linear Programming Support Vector Machines Using Decomposition Techniques

Abstract

Decomposition techniques are used to speed up training support vector machines but for linear programming support vector machines (LP-SVMs) direct implementation of decomposition techniques leads to infinite loops. To solve this problem and to further speed up training, in this paper, we propose an improved decomposition techniques for training LP-SVMs. If an infinite loop is detected, we include in the next working set all the data in the working sets that form the infinite loop. To further accelerate training, we improve a working set selection strategy: at each iteration step, we check the number of violations of complementarity conditions and constraints. If the number of violations increases, we conclude that the important data are removed from the working set and restore the data into the working set. The computer experiments demonstrate that training by the proposed decomposition technique with improved working set selection is drastically faster than that without using the decomposition technique. Furthermore, it is always faster than that without improving the working set selection for all the cases tested.

Yusuke Torii, Shigeo Abe

Multiple Classifier Systems

Multiple Classifier Systems for Embedded String Patterns

Abstract

Multiple classifier systems are a well proven and tested instrument for enhancing the recognition accuracy in statistical pattern recognition problems. However, there has been reported only little work on combining classifiers in structural pattern recognition. In this paper we describe a method for embedding strings into real vector spaces based on prototype selection, in order to gain several vectorial descriptions of the string data. We present methods for combining multiple classifiers trained on various vectorial data representations. As base classifiers we use nearest neighbor methods and support vector machine. In our experiments we demonstrate that this approach can be used to significantly improve the classification accuracy of string patterns.

Barbara Spillmann, Michel Neuhaus, Horst Bunke

Multiple Neural Networks for Facial Feature Localization in Orientation-Free Face Images

Abstract

We present in this paper a new facial feature localizer. It uses a kind of auto-associative neural network trained to localize specific facial features (like eyes and mouth corners) in orientation-free faces. One possible extension is presented where several specialized detectors are trained to deal with each face orientation. To select the best localization hypothesis, we combine radiometric and probabilistic information. The method is quite fast and accurate. The mean localization error (estimated on more than 700 test images) is lower than 9%.

Lionel Prevost, Rachid Belaroussi, Maurice Milgram

Hierarchical Neural Networks Utilising Dempster-Shafer Evidence Theory

Abstract

Hierarchical neural networks show many benefits when employed for classification problems even when only simple methods analogous to decision trees are used to retrieve the classification result. More complex ways of evaluating the hierarchy output that take into account the complete information the hierarchy provides yield improved classification results. Due to the hierarchical output space decomposition that is inherent to hierarchical neural networks the usage of Dempster-Shafer evidence theory suggests itself as it allows for the representation of evidence at different levels of abstraction. Moreover, it provides the possibility to differentiate between uncertainty and ignorance. The proposed approach has been evaluated using three different data sets and showed consistently improved classification results compared to the simple decision-tree-like retrieval method.

Rebecca Fay, Friedhelm Schwenker, Christian Thiel, Günther Palm

Combining MF Networks: A Comparison Among Statistical Methods and Stacked Generalization

Abstract

The two key factors to design an ensemble of neural networks are how to train the individual networks and how to combine the different outputs to get a single output. In this paper we focus on the combination module. We have proposed two methods based on Stacked Generalization as the combination module of an ensemble of neural networks. In this paper we have performed a comparison among the two versions of Stacked Generalization and six statistical combination methods in order to get the best combination method. We have used the mean increase of performance and the mean percentage or error reduction for the comparison. The results show that the methods based on Stacked Generalization are better than classical combiners.

Joaquín Torres-Sospedra, Carlos Hernández-Espinosa, Mercedes Fernández-Redondo

Visual Object Recognition

Object Detection and Feature Base Learning with Sparse Convolutional Neural Networks

Abstract

A new convolutional neural network model termed sparse convolutional neural network (SCNN) is presented and its usefulness for real-time object detection in gray-valued, monocular video sequences is demonstrated. SCNNs are trained on ”raw” gray values and are intended to perform feature selection as a part of regular neural network training. For this purpose, the learning rule is extended by an unsupervised component which performs a local nonlinear principal components analysis: in this way, meaningful and diverse properties can be computed from local image patches. The SCNN model can be used to train classifiers for different object classes which share a common first layer, i.e., a common preprocessing. This is of advantage since the information needs only to be calculated once for all classifiers. It is further demonstrated how SCNNs can be implemented by successive convolutions of the input image: scanning an image for objects at all possible locations is shown to be possible in real-time using this technique.

Alexander R. T. Gepperth

Visual Classification of Images by Learning Geometric Appearances Through Boosting

Abstract

We present a multiclass classification system for gray value images through boosting. The feature selection is done using the LPBoost algorithm which selects suitable features of adequate type. In our experiments we use up to nine different kinds of feature types simultaneously. Furthermore, a greedy search strategy within the weak learner is used to find simple geometric relations between selected features from previous boosting rounds. The final hypothesis can also consist of more than one geometric model for an object class. Finally, we provide a weight optimization method for combining the learned one-vs-one classifiers for the multiclass classification. We tested our approach on a publicly available data set and compared our results to other state-of-the-art approaches, such as the ”bag of keypoints” method.

Martin Antenreiter, Christian Savu-Krohn, Peter Auer

An Eye Detection System Based on Neural Autoassociators

Abstract

Automatic eye tracking is a challenging task, with numerous applications in biometrics, security, intelligent human–computer interfaces, and driver’s sleepiness detection systems. Eye localization and extraction is, therefore, the first step to the solution of such problems. In this paper, we present a new method, based on neural autoassociators, to solve the problem of detecting eyes from a facial image. A subset of the AR Database, collecting individuals both with or without glasses and with open or closed eyes, has been used for experiments and benchmarking. Preliminary experimental results are very promising and demonstrate the efficiency of the proposed eye localization system.

Monica Bianchini, Lorenzo Sarti

Orientation Histograms for Face Recognition

Abstract

In this paper we present a method to recognize human faces based on histograms of local orientation. Orientation histograms were used as input feature vectors for a k-nearest neigbour classifier. We present a method to calculate orientation histograms of n×n subimages partitioning the 2D-camera image with the segmented face. Numerical experiments have been made utilizing the Olivetti Research Laboratory (ORL) database containing 400 images of 40 subjects. Remarkable recognition rates of 98% to 99% were achieved with this extremely simple approach.

Friedhelm Schwenker, Andreas Sachs, Günther Palm, Hans A. Kestler

Data Mining in Bioinformatics

An Empirical Comparison of Feature Reduction Methods in the Context of Microarray Data Classification

Abstract

The differentiation between cancerous and benign processes in the body often poses a difficult diagnostic problem in the clinical setting while being of major importance for the treatment of patients. Measuring the expression of a large number of genes with DNA microarrays may serve this purpose. While the expression level of several thousands of genes can be measured in a single experiment, only a few dozens of experiments are normally carried out, leading to data sets of very high dimensionality and low cardinality. In this situation, feature reduction techniques capable of reducing the dimensionality of data are essential for building predictive tools based on classification.

Methods and Data: We compare the popular feature selection and classification method PAM (Tibshirani et al.) to several other methods. Feature reduction and feature ranking methods, such as Random Projection, Random Feature Selection, Area under the ROC curve and PCA are applied. We employ these together with the classification component of PAM, Linear Discriminant Analysis (LDA), a Nearest Prototype (NP) classifier and linear support vector machines (SVMs). We apply these methods to three publicly available linearly separable gene expression data sets of varying cardinality and dimensionality.

Results and Conclusions: In our experiments with the gene expression data we could not discover a clearly superior algorithm, instead most surprisingly we found that feature reduction using random projections or selections performed often equally well.

Hans A. Kestler, Christoph Müssel

Unsupervised Feature Selection for Biomarker Identification in Chromatography and Gene Expression Data

Abstract

A novel approach to feature selection from unlabeled vector data is presented. It is based on the reconstruction of original data relationships in an auxiliary space with either weighted or omitted features. Feature weighting, on one hand, is related to the return forces of factors in a parametric data similarity measure as response to disturbance of their optimum values. Feature omission, on the other hand, inducing measurable loss of reconstruction quality, is realized in an iterative greedy way. The proposed framework allows to apply custom data similarity measures. Here, adaptive Euclidean distance and adaptive Pearson correlation are considered, the former serving as standard reference, the latter being usefully for intensity data. Results of the different strategies are given for chromatography and gene expression data.

Marc Strickert, Nese Sreenivasulu, Silke Peterek, Winfriede Weschke, Hans-Peter Mock, Udo Seiffert

Learning and Feature Selection Using the Set Covering Machine with Data-Dependent Rays on Gene Expression Profiles

Abstract

Microarray technologies are increasingly being used in biological and medical sciences for high throughput analyses of genetic information on the genome, transcriptome and proteome levels. The differentiation between cancerous and benign processes in the body often poses a difficult diagnostic problem in the clinical setting while being of major importance for the treatment of patients. In this situation, feature reduction techniques capable of reducing the dimensionality of data are essential for building predictive tools based on classification. We extend the set covering machine of Marchand and Shawe-Taylor to data dependent rays in order to achieve a feature reduction and direct interpretation of the found conjunctions of intervals on individual genes. We give bounds for the generalization error as a function of the amount of data compression and the number of training errors achieved during training. In experiments with artificial data and a real world data set of gene expression profiles from the pancreas we show the utility of the approach and its applicability to microarray data classification.

Hans A. Kestler, Wolfgang Lindner, André Müller

Springer Professional

Artificial Neural Networks in Pattern Recognition

Second IAPR Workshop, ANNPR 2006, Ulm, Germany, August 31-September 2, 2006. Proceedings

Table of Contents

Frontmatter

Unsupervised Learning

Simple and Effective Connectionist Nonparametric Estimation of Probability Density Functions

Comparison Between Two Spatio-Temporal Organization Maps for Speech Recognition

Adaptive Feedback Inhibition Improves Pattern Discrimination Learning

Semi-supervised Learning

Supervised Batch Neural Gas

Fuzzy Labeled Self-Organizing Map with Label-Adjusted Prototypes

On the Effects of Constraints in Semi-supervised Hierarchical Clustering

A Study of the Robustness of KNN Classifiers Trained Using Soft Labels

Supervised Learning

An Experimental Study on Training Radial Basis Functions by Gradient Descent

A Local Tangent Space Alignment Based Transductive Classification Algorithm

Incremental Manifold Learning Via Tangent Space Alignment

A Convolutional Neural Network Tolerant of Synaptic Faults for Low-Power Analog Hardware

Ammonium Estimation in a Biological Wastewater Plant Using Feedforward Neural Networks

Support Vector Learning

Support Vector Regression Using Mahalanobis Kernels

Incremental Training of Support Vector Machines Using Truncated Hypercones

Fast Training of Linear Programming Support Vector Machines Using Decomposition Techniques

Multiple Classifier Systems

Multiple Classifier Systems for Embedded String Patterns

Multiple Neural Networks for Facial Feature Localization in Orientation-Free Face Images

Hierarchical Neural Networks Utilising Dempster-Shafer Evidence Theory

Combining MF Networks: A Comparison Among Statistical Methods and Stacked Generalization

Visual Object Recognition

Object Detection and Feature Base Learning with Sparse Convolutional Neural Networks

Visual Classification of Images by Learning Geometric Appearances Through Boosting

An Eye Detection System Based on Neural Autoassociators

Orientation Histograms for Face Recognition

Data Mining in Bioinformatics

An Empirical Comparison of Feature Reduction Methods in the Context of Microarray Data Classification

Unsupervised Feature Selection for Biomarker Identification in Chromatography and Gene Expression Data

Learning and Feature Selection Using the Set Covering Machine with Data-Dependent Rays on Gene Expression Profiles

Backmatter

Premium Partner