nach oben

2007 | Buch

Kapitel lesen Erstes Kapitel lesen

Artificial Neural Networks – ICANN 2007

17th International Conference, Porto, Portugal, September 9-13, 2007, Proceedings, Part I

herausgegeben von: Joaquim Marques de Sá, Luís A. Alexandre, Włodzisław Duch, Danilo Mandic

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Inhaltsverzeichnis

Frontmatter

Learning Theory

Generalization Error of Automatic Relevance Determination

The automatic relevance determination (ARD) shows good performance in many applications. Recently, it has been applied to brain current estimation with the variational method. Although people who use the ARD tend to pay attention to one benefit of the ARD, sparsity, we, in this paper, focus on another benefit, generalization. In this paper, we clarify the generalization error of the ARD in the case that a class of prior distributions is used, and show that good generalization is caused by singularities of the ARD. Sparsity is not observed in that case, however, the mechanism that the singularities provide good generalization implies the mechanism that they also provide sparsity.

Shinichi Nakajima, Sumio Watanabe

On a Singular Point to Contribute to a Learning Coefficient and Weighted Resolution of Singularities

A lot of learning machines which have the hidden variables or the hierarchical structures are the singular statistical models. They have a different learning performance from the regular statistical models. In this paper, we show that the learning coefficient is easily computed by weighted blow up, in contrast, and that there is the case that the learning coefficient cannot be correctly computed by blowing up at the origin

only.

Takeshi Matsuda, Sumio Watanabe

Improving the Prediction Accuracy of Echo State Neural Networks by Anti-Oja’s Learning

Echo state neural networks, which are a special case of recurrent neural networks, are studied from the viewpoint of their learning ability, with a goal to achieve their greater prediction ability. A standard training of these neural networks uses pseudoinverse matrix for one-step learning of weights from hidden to output neurons. This regular adaptation of Echo State neural networks was optimized by updating the weights of the dynamic reservoir with Anti-Oja’s learning. Echo State neural networks use dynamics of this massive and randomly initialized dynamic reservoir to extract interesting properties of incoming sequences. This approach was tested in laser fluctuations and Mackey-Glass time series prediction. The prediction error achieved by this approach was substantially smaller in comparison with prediction error achieved by a standard algorithm.

Štefan Babinec, Jiří Pospíchal

Theoretical Analysis of Accuracy of Gaussian Belief Propagation

Belief propagation (BP) is the calculation method which enables us to obtain the marginal probabilities with a tractable computational cost. BP is known to provide true marginal probabilities when the graph describing the target distribution has a tree structure, while do approximate marginal probabilities when the graph has loops. The accuracy of loopy belief propagation (LBP) has been studied. In this paper, we focus on applying LBP to a multi-dimensional Gaussian distribution and analytically show how accurate LBP is for some cases.

Yu Nishiyama, Sumio Watanabe

Relevance Metrics to Reduce Input Dimensions in Artificial Neural Networks

The reduction of input dimensionality is an important subject in modelling, knowledge discovery and data mining. Indeed, an appropriate combination of inputs is desirable in order to obtain better generalisation capabilities with the models. There are several approaches to perform input selection. In this work we will deal with techniques guided by measures of input relevance or input sensitivity. Six strategies to assess input relevance were tested over four benchmark datasets using a backward selection wrapper. The results show that a group of techniques produces input combinations with better generalisation capabilities even if the implemented wrapper does not compute any measure of generalisation performance.

Héctor F. Satizábal M., Andres Pérez-Uribe

An Improved Greedy Bayesian Network Learning Algorithm on Limited Data

Although encouraging results have been reported, existing Bayesian network (BN) learning algorithms have some troubles on limited data. A statistical or information theoretical measure or a score function may be unreliable on limited datasets, which affects learning accuracy. To alleviate the above problem, we propose a novel BN learning algorithm MRMRG, Max Relevance and Min Redundancy Greedy algorithm. MRMRG algorithm applies Max Relevance and Min Redundancy feature selection technique and proposes Local Bayesian Increment (LBI) function according to the Bayesian Information Criterion (BIC) formula and the likelihood property of overfitting. Experimental results show that MRMRG algorithm has much better accuracy than most of existing BN learning algorithms when learning BNs from limited datasets.

Feng Liu, Fengzhan Tian, Qiliang Zhu

Incremental One-Class Learning with Bounded Computational Complexity

An incremental one-class learning algorithm is proposed for the purpose of outlier detection. Outliers are identified by estimating - and thresholding - the probability distribution of the training data. In the early stages of training a non-parametric estimate of the training data distribution is obtained using kernel density estimation. Once the number of training examples reaches the maximum computationally feasible limit for kernel density estimation, we treat the kernel density estimate as a maximally-complex Gaussian mixture model, and keep the model complexity constant by merging a pair of components for each new kernel added. This method is shown to outperform a current state-of-the-art incremental one-class learning algorithm (Incremental SVDD [5]) on a variety of datasets, while requiring only an upper limit on model complexity to be specified.

Rowland R. Sillito, Robert B. Fisher

Estimating the Size of Neural Networks from the Number of Available Training Data

Estimating

a priori

the size of neural networks for achieving high classification accuracy is a hard problem. Existing studies provide theoretical upper bounds on the size of neural networks that are unrealistic to implement. This work provides a computational study for estimating the size of neural networks using as an estimation parameter the size of available training data. We will also show that the size of a neural network is problem dependent and that one only needs the number of available training data to determine the size of the required network for achieving high classification rate. We use for our experiments a threshold neural network that combines the perceptron algorithm with simulated annealing and we tested our results on datasets from the UCI Machine Learning Repository. Based on our experimental results, we propose a formula to estimate the number of perceptrons that have to be trained in order to achieve a high classification accuracy.

Georgios Lappas

A Maximum Weighted Likelihood Approach to Simultaneous Model Selection and Feature Weighting in Gaussian Mixture

This paper is to identify the clustering structure and the relevant features automatically and simultaneously in the context of Gaussian mixture model. We perform this task by introducing two sets of weight functions under the recently proposed

Maximum Weighted Likelihood

(MWL) learning framework. One set is to reward the significance of each component in the mixture, and the other one is to discriminate the relevance of each feature to the cluster structure. The experiments on both the synthetic and real-world data show the efficacy of the proposed algorithm.

Yiu-ming Cheung, Hong Zeng

Estimation of Poles of Zeta Function in Learning Theory Using Padé Approximation

Learning machines such as neural networks, Gaussian mixtures, Bayes networks, hidden Markov models, and Boltzmann machines are called singular learning machines, which have been applied to many real problems such as pattern recognition, time-series prediction, and system control. However, these learning machines have singular points which are attributable to their hierarchical structures or symmetry properties. Hence, the maximum likelihood estimators do not have asymptotic normality, and conventional asymptotic theory for statistical regular models can not be applied. Therefore, theoretical optimal model selections or designs involve algebraic geometrical analysis. The algebraic geometrical analysis requires blowing up, which is to obtain maximum poles of zeta functions in learning theory, however, it is hard for complex learning machines. In this paper, a new method which obtains the maximum poles of zeta functions in learning theory by numerical computations is proposed, and its effectiveness is shown by experimental results.

Ryosuke Iriguchi, Sumio Watanabe

Neural Network Ensemble Training by Sequential Interaction

Neural network ensemble (NNE) has been shown to outperform single neural network (NN) in terms of generalization ability. The performance of NNE is therefore depends on well diversity among component NNs. Popular NNE methods, such as bagging and boosting, follow data sampling technique to achieve diversity. In such methods, NN is trained independently with a particular training set that is probabilistically created. Due to independent training strategy there is a lack of interaction among component NNs. To achieve training time interaction, negative correlation learning (NCL) has been proposed for simultaneous training. NCL demands direct communication among component NNs; which is not possible in bagging and boosting. In this study, first we modify the NCL from simultaneous to sequential style and then induce in bagging and boosting for interaction purpose. Empirical studies exhibited that sequential training time interaction increased diversity among component NNs and outperformed conventional methods in generalization ability.

M. A. H. Akhand, Kazuyuki Murase

Improving Optimality of Neural Rewards Regression for Data-Efficient Batch Near-Optimal Policy Identification

In this paper we present two substantial extensions of Neural Rewards Regression (NRR) [1]. In order to give a less biased estimator of the Bellman Residual and to facilitate the regression character of NRR, we incorporate an improved, Auxiliared Bellman Residual [2] and provide, to the best of our knowledge, the first Neural Network based implementation of the novel Bellman Residual minimisation technique. Furthermore, we extend NRR to Policy Gradient Neural Rewards Regression (PGNRR), where the strategy is directly encoded by a policy network. PGNRR profits from both the data-efficiency of the Rewards Regression approach and the directness of policy search methods. PGNRR further overcomes a crucial drawback of NRR as it extends the accordant problem class considerably by the applicability of continuous action spaces.

Daniel Schneegaß, Steffen Udluft, Thomas Martinetz

Advances in Neural Network Learning Methods

Structure Learning with Nonparametric Decomposable Models

We present a novel approach to structure learning for graphical models. By using nonparametric estimates to model clique densities in decomposable models, both discrete and continuous distributions can be handled in a unified framework. Also, consistency of the underlying probabilistic model is guaranteed. Model selection is based on predictive assessment, with efficient algorithms that allow fast greedy forward and backward selection within the class of decomposable models. We show the validity of this structure learning approach on toy data, and on two large sets of gene expression data.

Anton Schwaighofer, Mathäus Dejori, Volker Tresp, Martin Stetter

Recurrent Bayesian Reasoning in Probabilistic Neural Networks

Considering the probabilistic approach to neural networks in the framework of statistical pattern recognition we assume approximation of class-conditional probability distributions by finite mixtures of product components. The mixture components can be interpreted as probabilistic neurons in neurophysiological terms and, in this respect, the fixed probabilistic description becomes conflicting with the well known short-term dynamic properties of biological neurons. We show that some parameters of PNN can be “released” for the sake of dynamic processes without destroying the statistically correct decision making. In particular, we can iteratively adapt the mixture component weights or modify the input pattern in order to facilitate the correct recognition.

Jiří Grim, Jan Hora

Resilient Approximation of Kernel Classifiers

Trained support vector machines (SVMs) have a slow run-time classification speed if the classification problem is noisy and the sample data set is large. Approximating the SVM by a more sparse function has been proposed to solve to this problem. In this study, different variants of approximation algorithms are empirically compared. It is shown that gradient descent using the improved Rprop algorithm increases the robustness of the method compared to fixed-point iteration. Three different heuristics for selecting the support vectors to be used in the construction of the sparse approximation are proposed. It turns out that none is superior to random selection. The effect of a finishing gradient descent on all parameters of the sparse approximation is studied.

Thorsten Suttorp, Christian Igel

Incremental Learning of Spatio-temporal Patterns with Model Selection

This paper proposes a biologically inspired incremental learning method for spatio-temporal patterns based on our recently reported “Incremental learning through sleep (ILS)” method. This method alternately repeats two learning phases: awake and sleep. During the awake phase, the system learns new spatio-temporal patterns by rote, whereas in the sleep phase, it rehearses the recorded new memories interleaved with old memories. The rehearsal process is essential for reconstructing the internal representation of the neural network so as not only to memorize the new patterns while keeping old memories but also to reduce redundant hidden units. By using this strategy, the neural network achieves high generalization ability.

The most attractive property of the method is the incremental learning ability of non-independent distributed samples without catastrophic forgetting despite using a small amount of resources. We applied our method to an experiment on robot control signals, which vary depending on the context of the current situation.

Koichiro Yamauchi, Masayoshi Sato

Accelerating Kernel Perceptron Learning

Recently it has been shown that appropriate perceptron training methods, such as the Schlesinger–Kozinec (SK) algorithm, can provide maximal margin hyperplanes with training costs O(

), with

denoting sample size and

the number of training iterations. In this work we shall relate SK training with the classical Rosenblatt rule and show that, when the hyperplane vector is written in dual form, the support vector (SV) coefficients determine their training appearance frequency; in particular, large coefficient SVs penalize training costs. Under this light we shall explore a training acceleration procedure in which large coefficient and, hence, large cost SVs are removed from training and that allows for a further stable large sample shrinking. As we shall see, this results in a much faster training while not penalizing test classification.

Daniel García, Ana González, José R. Dorronsoro

Analysis and Comparative Study of Source Separation Performances in Feed-Forward and Feed-Back BSSs Based on Propagation Delays in Convolutive Mixture

Feed-Forward (FF-) and Feed-Back (FB-) structures have been proposed for Blind Source Separation (BSS). The FF-BSS systems have some degrees of freedom in the solution space, and signal distortion is likely to occur in convolutive mixtures. On the other hand, the FB-BSS structure does not cause signal distortion. However, it requires a condition on the propagation delays in the mixing process. In this paper, source separation performance in the FB-BSS is theoretically analyzed taking the propagation delays into account. Simulation is carried out by using white signals and speech signals as the signal sources. The FF-BSS system and the FB-BSS system are compared. Even though the FB-BSS can provide good separation performance, there exits some limitation on location of the signal sources and the sensors.

Akihide Horita, Kenji Nakayama, Akihiro Hirano

Learning Highly Non-separable Boolean Functions Using Constructive Feedforward Neural Network

Learning problems with inherent non-separable Boolean logic is still a challenge that has not been addressed by neural or kernel classifiers. The

-separability concept introduced recently allows for characterization of complexity of non-separable learning problems. A simple constructive feedforward network that uses a modified form of the error function and a window-like functions to localize outputs after projections on a line has been tested on such problems with quite good results. The computational cost of training is low because most nodes and connections are fixed and only weights of one node are modified at each training step. Several examples of learning Boolean functions and results of classification tests on real-world multiclass datasets are presented.

Marek Grochowski, Włodzisław Duch

A Fast Semi-linear Backpropagation Learning Algorithm

Ever since the first gradient-based algorithm, the brilliant

backpropagation

proposed by Rumelhart, a variety of new training algorithms have emerged to improve different aspects of the learning process for feed-forward neural networks. One of these aspects is the

learning speed

. In this paper, we present a learning algorithm that combines linear-least-squares with gradient descent. The theoretical basis for the method is given and its performance is illustrated by its application to several examples in which it is compared with other learning algorithms and well known data sets. Results show the proposed algorithm improves the learning speed of the basic backpropagation algorithm in several orders of magnitude, while maintaining good optimization accuracy. Its performance and low computational cost makes it an interesting alternative even for second order methods, specially when dealing large networks and training sets.

Bertha Guijarro-Berdiñas, Oscar Fontenla-Romero, Beatriz Pérez-Sánchez, Paula Fraguela

Improving the GRLVQ Algorithm by the Cross Entropy Method

This paper discusses an alternative approach to parameter optimization of prototype-based learning algorithms that aim to minimize an objective function based on gradient search. The proposed approach is a stochastic optimization method called the Cross Entropy (CE) method. The CE method is used to tackle the initialization sensitiveness problem associated with the original generalized Learning Vector Quantization (GLVQ) algorithm and its variants and to locate the globally optimal solutions. We will focus our study on a variant which deals with a weighted norm instead of the Euclidean norm in order to select the most relevant features. The results in this paper indicate that the CE method can successfully be applied to this kind of problems and efficiently generate high quality solutions. Also, highly competitive numerical results on real world data sets are reported.

Abderrahmane Boubezoul, Sébastien Paris, Mustapha Ouladsine

Incremental and Decremental Learning for Linear Support Vector Machines

We present a method to find the exact maximal margin hyperplane for linear Support Vector Machines when a new (existing) component is added (removed) to (from) the inner product. The maximal margin hyperplane with the new inner product is obtained in terms of that for the old inner product, without re-computing it from scratch and the procedure is reversible. An algorithm to implement the proposed method is presented, which avoids matrix inversions from scratch. Among the possible applications, we find feature selection and the design of kernels out of similarity measures.

Enrique Romero, Ignacio Barrio, Lluís Belanche

An Efficient Method for Pruning the Multilayer Perceptron Based on the Correlation of Errors

In this paper we present a novel method for pruning redundant weights of a trained multilayer Perceptron (MLP). The proposed method is based on the correlation analysis of the errors produced by the output neurons and the backpropagated errors associated with the hidden neurons. Repeated applications of it leads eventually to the complete elimination of all connections of a neuron. Simulations using real-world data indicate that, in terms of performance, the proposed method compares favorably with standard pruning techniques, such as the Optimal Brain Surgeon (OBS) and Weight Decay and Elimination (WDE), but with much lower computational costs.

Cláudio M. S. Medeiros, Guilherme A. Barreto

Reinforcement Learning for Cooperative Actions in a Partially Observable Multi-agent System

In this article, we apply a policy gradient-based reinforcement learning to allowing multiple agents to perform cooperative actions in a partially observable environment. We introduce an auxiliary state variable, an internal state, whose stochastic process is Markov, for extracting important features of multi-agent’s dynamics. Computer simulations show that every agent can identify an appropriate internal state model and acquire a good policy; this approach is shown to be more effective than a traditional memory-based method.

Yuki Taniguchi, Takeshi Mori, Shin Ishii

Input Selection for Radial Basis Function Networks by Constrained Optimization

Input selection in the nonlinear function approximation is important and difficult problem. Neural networks provide good generalization in many cases, but their interpretability is usually limited. However, the contributions of input variables in the prediction of output would be valuable information in many real world applications. In this work, an input selection algorithm for Radial basis function networks is proposed. The selection of input variables is achieved using a constrained cost function, in which each input dimension is weighted. The constraints are imposed on the values of weights. The proposed algorithm solves a log-barrier reformulation of the original optimization problem. The input selection algorithm was applied to both simulated and benchmark data and obtained results were compelling.

Jarkko Tikka

An Online Backpropagation Algorithm with Validation Error-Based Adaptive Learning Rate

We present a new learning algorithm for feed-forward neural networks based on the standard Backpropagation method using an adaptive global learning rate. The adaption is based on the evolution of the error criteria but in contrast to most other approaches, our method uses the error measured on the

validation

set instead of the

training

set to dynamically adjust the global learning rate. At no time the examples of the validation set are directly used for training the network in order to maintain its original purpose of validating the training and to perform ”early stopping”. The proposed algorithm is a heuristic method consisting of two phases. In the first phase the learning rate is adjusted after each iteration such that a minimum of the error criteria on the validation set is quickly attained. In the second phase, this search is refined by repeatedly reverting to previous weight configurations and decreasing the global learning rate. We experimentally show that the proposed method rapidly converges and that it outperforms standard Backpropagation in terms of generalization when the size of the training set is reduced.

Stefan Duffner, Christophe Garcia

Adaptive Self-scaling Non-monotone BFGS Training Algorithm for Recurrent Neural Networks

In this paper, we propose an adaptive BFGS, which uses a self-adaptive scaling factor for the Hessian matrix and is equipped with nonmonotone strategy. Our experimental evaluation using different recurrent networks architectures provides evidence that the proposed approach trains successfully recurrent networks of various architectures, inheriting the benefits of the BFGS and, at the same time, alleviating some of its limitations.

Chun-Cheng Peng, George D. Magoulas

Some Properties of the Gaussian Kernel for One Class Learning

This paper proposes a novel approach for directly tuning the gaussian kernel matrix for one class learning. The popular gaussian kernel includes a free parameter,

, that requires tuning typically performed through validation. The value of this parameter impacts model performance significantly. This paper explores an automated method for tuning this kernel based upon a hill climbing optimization of statistics obtained from the kernel matrix.

Paul F. Evangelista, Mark J. Embrechts, Boleslaw K. Szymanski

Improved SOM Learning Using Simulated Annealing

Self-Organizing Map (SOM) algorithm has been extensively used for analysis and classification problems. For this kind of problems, datasets become more and more large and it is necessary to speed up the SOM learning. In this paper we present an application of the Simulated Annealing (SA) procedure to the SOM learning algorithm. The goal of the algorithm is to obtain fast learning and better performance in terms of matching of input data and regularity of the obtained map. An advantage of the proposed technique is that it preserves the simplicity of the basic algorithm. Several tests, carried out on different large datasets, demonstrate the effectiveness of the proposed algorithm in comparison with the original SOM and with some of its modification introduced to speed-up the learning.

Antonino Fiannaca, Giuseppe Di Fatta, Salvatore Gaglio, Riccardo Rizzo, Alfonso M. Urso

The Usage of Golden Section in Calculating the Efficient Solution in Artificial Neural Networks Training by Multi-objective Optimization

In this work a modification was made on the algorithm of Artificial Neural Networks (NN) Training of the Multilayer Perceptron type (MLP) based on multi-objective optimization (MOBJ), to increase its computational efficiency. Usually, the number of efficient solutions to be generated is a parameter that must be provided by the user. In this work, this number is automatically determined by an algorithm, through the usage of golden section, being generally less when specified, showing a sensible reduction in the processing time and keeping the high generalization capability of the obtained solution from the original method.

Roselito A. Teixeira, Antônio P. Braga, Rodney R. Saldanha, Ricardo H. C. Takahashi, Talles H. Medeiros

Ensemble Learning

Designing Modular Artificial Neural Network Through Evolution

The purpose of this article is to make a contribution to the study of modular structure of neural nets, in particular to describe a method of automatic neural net modularization. The problem specific modularizations of the representation emerge through the iterations of the evolutionary algorithm directly with the problem. We used the probability vector to construct

– bit vectors, which represented individuals in the population (in our approach they describe an architecture of a neural network). All individuals in every generation are pseudorandomly generated from the probability vector that is associated with this generation. The probability vector is updated on the basis of best individuals in a population, so that next generations are getting progressively closer to best solutions. The process is repeated until the probability vector entries are close to zero or to one. The resulting probability vector then determines an optimal solution of the given optimization task.

Eva Volna

Averaged Conservative Boosting: Introducing a New Method to Build Ensembles of Neural Networks

In this paper, a new algorithm called

Averaged Conservative Boosting

(

ACB

) is presented to build ensembles of neural networks. In

ACB

we mix the improvements that

Averaged Boosting

(

Aveboost

) and

Conservative Boosting

(

Conserboost

) made to

Adaptive Boosting

(

Adaboost

). In the algorithm we propose we have applied the conservative equation used in

Conserboost

along with the averaged procedure used in

Aveboost

in order to update the sampling distribution used in the training of

Adaboost

. We have tested the methods with seven databases from the

UCI repository

. The results show that the best results are provided by our method,

Averaged Conservative Boosting

Joaquín Torres-Sospedra, Carlos Hernández-Espinosa, Mercedes Fernández-Redondo

Selection of Decision Stumps in Bagging Ensembles

This article presents a comprehensive study of different ensemble pruning techniques applied to a bagging ensemble composed of decision stumps. Six different ensemble pruning methods are tested. Four of these are greedy strategies based on first reordering the elements of the ensemble according to some rule that takes into account the complementarity of the predictors with respect to the classification task. Subensembles of increasing size are then constructed by incorporating the ordered classifiers one by one. A halting criterion stops the aggregation process before the complete original ensemble is recovered. The other two approaches are selection techniques that attempt to identify optimal subensembles using either genetic algorithms or semidefinite programming. Experiments performed on 24 benchmark classification tasks show that the selection of a small subset (≈ 10 − 15%) of the original pool of stumps generated with bagging can significantly increase the accuracy and reduce the complexity of the ensemble.

Gonzalo Martínez-Muñoz, Daniel Hernández-Lobato, Alberto Suárez

An Ensemble Dependence Measure

Ensemble methods in supervised classification problems have been shown to be superior to single base classifiers of comparable performance, particularly when used in conjunction with multi-layer perceptron base classifiers. An ensemble’s performance is related to the accuracy and diversity of its component classifiers. Intuitively, diversity seems to be a desirable quality for a collection of non-optimal classifiers. Despite much interest being shown in diversity, little progress has been made in linking generalisation performance to any specific diversity metric.

With the agglomeration of even modestly accurate statistically independent classifiers it can be shown theoretically that ensemble accuracy can be forced close to optimality. Despite this theoretical benchmark, real world ensembles fall far short of this performance. The root of this problem is the lack of statistical independence amongst the base classifiers. We investigate a measure of statistical dependence in ensembles,

, and its relationship to the

diversity metric and pairwise correlation and also examine voting patterns in real world ensembles. We show that, whilst Q is relatively insensitive to changes in the ensemble configuration

measures correlations between the base classifiers effectively. The experiments are based on several two class problems from the UCI data sets and use bootstrapped ensembles of relatively weak, multi-layer perceptron, base classifiers.

Matthew Prior, Terry Windeatt

Boosting Unsupervised Competitive Learning Ensembles

Topology preserving mappings are great tools for data visualization and inspection in large datasets. This research presents a combination of several topology preserving mapping models with some basic classifier ensemble and boosting techniques in order to increase the stability conditions and, as an extension, the classification capabilities of the former. A study and comparison of the performance of some novel and classical ensemble techniques are presented in this paper to test their suitability, both in the fields of data visualization and classification when combined with topology preserving models such as the SOM, ViSOM or ML-SIM.

Emilio Corchado, Bruno Baruque, Hujun Yin

Using Fuzzy, Neural and Fuzzy-Neural Combination Methods in Ensembles with Different Levels of Diversity

Classifier Combination has been investigated as an alternative to obtain improvements in design and/or accuracy for difficult pattern recognition problems. In the literature, many combination methods and algorithms have been developed, including methods based on computational Intelligence, such as: fuzzy sets, neural networks and fuzzy neural networks. This paper presents an evaluation of how different levels of diversity reached by the choice of the components can affect the accuracy of some combination methods. The aim of this analysis is to investigate whether or not fuzzy, neural and fuzzy-neural combination methods are affected by the choice of the ensemble members.

Anne M. P. Canuto, Marjory C. C. Abreu

Spiking Neural Networks

SpikeStream: A Fast and Flexible Simulator of Spiking Neural Networks

SpikeStream is a new simulator of biologically structured spiking neural networks that can be used to edit, display and simulate up to 100,000 neurons. This simulator uses a combination of event-based and synchronous simulation and stores most of its information in databases, which makes it easy to run simulations across an arbitrary number of machines. A comprehensive graphical interface is included and SpikeStream can send and receive spikes to and from real and virtual robots across a network. The architecture is highly modular, and so other researchers can use its graphical editing facilities to set up their own simulation networks or apply genetic algorithms to the SpikeStream databases. SpikeStream is available for free download under the terms of the GPL.

David Gamez

Evolutionary Multi-objective Optimization of Spiking Neural Networks

Evolutionary multi-objective optimization of spiking neural networks for solving classification problems is studied in this paper. By means of a Pareto-based multi-objective genetic algorithm, we are able to optimize both classification performance and connectivity of spiking neural networks with the latency coding. During optimization, the connectivity between two neurons, i.e., whether two neurons are connected, and if connected, both weight and delay between the two neurons, are evolved. We minimize the the classification error in percentage or the root mean square error for optimizing performance, and minimize the number of connections or the sum of delays for connectivity to investigate the influence of the objectives on the performance and connectivity of spiking neural networks. Simulation results on two benchmarks show that Pareto-based evolutionary optimization of spiking neural networks is able to offer a deeper insight into the properties of the spiking neural networks and the problem at hand.

Yaochu Jin, Ruojing Wen, Bernhard Sendhoff

Building a Bridge Between Spiking and Artificial Neural Networks

Spiking neural networks (SNN) are a promising approach for the detection of patterns with a temporal component. However they provide more parameters than conventional artificial neural networks (ANN) which make them hard to handle. Many error-gradient-based approaches work with a time-to-first-spike code because the explicit calculation of a gradient in SNN is - due to the nature of spikes - very difficult. In this paper, we present the estimation of such an error-gradient based on the gain function of the neurons. This is done by interpreting spike trains as rate codes in a given time interval. This way a bridge is built between SNN and ANN. This bridge allows us to train the SNN with the well-known error back-propagation algorithm for ANN.

Florian Kaiser, Fridtjof Feldbusch

Clustering of Nonlinearly Separable Data Using Spiking Neural Networks

In this paper, we study the clustering capabilities of spiking neural networks. We first study the working of spiking neural networks for clustering linearly separable data. Also, a biological interpretation has been given to the delay selection in spiking neural networks. We show that by varying the firing threshold of spiking neurons during the training, nonlinearly separable data like the ring data can be clustered. When a multi-layer spiking neural network is trained for clustering, subclusters are formed in the hidden layer and these subclusters are combined in the output layer, resulting in hierarchical clustering of the data. A spiking neural network with a hidden layer is generally trained by modifying the weights of the connections to the nodes in the hidden layer and the output layer simultaneously. We propose a two-stage learning method for training a spiking neural network model for clustering. In the proposed method, the weights for the connections to the nodes in the hidden layer are learnt first, and then the weights for the connections to the nodes in the output layer are learnt. We show that the proposed two-stage learning method can cluster complex data such as the interlocking cluster data, without using lateral connections.

Lakshmi Narayana Panuku, C. Chandra Sekhar

Implementing Classical Conditioning with Spiking Neurons

In this paper, we attempt to implement classical conditioning with spiking neurons instead of connectionist neural networks. The neuron model used is a leaky linear integrate-and-fire model with a learning algorithm combining spike-time dependent Hebbian learning and spike-time dependent anti-Hebbian learning. Experimental results show that the major phenomena of classical conditioning, including Pavlovian conditioning, extinction, partial conditioning, blocking, inhibitory conditioning, overshadow and secondary conditioning, can be implemented by the spiking neuron model proposed here and further indicate that spiking neuron models are well suited to implementing classical conditioning.

Chong Liu, Jonathan Shapiro

Advances in Neural Network Architectures

Deformable Radial Basis Functions

Radial basis function networks (RBF) are efficient general function approximators. They show good generalization performance and they are easy to train. Due to theoretical considerations RBFs commonly use Gaussian activation functions. It has been shown that these tight restrictions on the choice of possible activation functions can be relaxed in practical applications. As an alternative difference of sigmoidal functions (SRBF) have been proposed. SRBFs have an additional parameter which increases the ability of a network node to adapt its shape to input patterns, even in cases where Gaussian functions fail.

In this paper we follow the idea of incorporating greater flexibility into radial basis functions. We propose to use splines as localized deformable radial basis functions (DRBF). We present initial results which show that DRBFs can be evaluated more effectively then SRBFs. We show that even with enhanced flexibility the network is easy to train and convergences robustly towards smooth solutions.

Wolfgang Hübner, Hanspeter A. Mallot

Selection of Basis Functions Guided by the L2 Soft Margin

Support Vector Machines (SVMs) for classification tasks produce sparse models by maximizing the margin. Two limitations of this technique are considered in this work: firstly, the number of support vectors can be large and, secondly, the model requires the use of (Mercer) kernel functions. Recently, some works have proposed to maximize the margin while controlling the sparsity. These works also require the use of kernels. We propose a search process to select a subset of basis functions that maximize the margin without the requirement of being kernel functions. The sparsity of the model can be explicitly controlled. Experimental results show that accuracy close to SVMs can be achieved with much higher sparsity. Further, given the same level of sparsity, more powerful search strategies tend to obtain better generalization rates than simpler ones.

Ignacio Barrio, Enrique Romero, Lluís Belanche

Extended Linear Models with Gaussian Prior on the Parameters and Adaptive Expansion Vectors

We present an approximate Bayesian method for regression and classification with models linear in the parameters. Similar to the Relevance Vector Machine (RVM), each parameter is associated with an expansion vector. Unlike the RVM, the number of expansion vectors is specified beforehand. We assume an overall Gaussian prior on the parameters and find, with a gradient based process, the expansion vectors that (locally) maximize the evidence. This approach has lower computational demands than the RVM, and has the advantage that the vectors do not necessarily belong to the training set. Therefore, in principle, better vectors can be found. Furthermore, other hyperparameters can be learned in the same smooth joint optimization. Experimental results show that the freedom of the expansion vectors to be located away from the training data causes overfitting problems. These problems are alleviated by including a hyperprior that penalizes expansion vectors located far away from the input data.

Ignacio Barrio, Enrique Romero, Lluís Belanche

Functional Modelling of Large Scattered Data Sets Using Neural Networks

We propose a self-organising hierarchical Radial Basis Function (RBF) network for functional modelling of large amounts of scattered unstructured point data. The network employs an error-driven active learning algorithm and a multi-layer architecture, allowing progressive bottom-up reinforcement of local features in subdivisions of error clusters. For each RBF subnet, neurons can be inserted, removed or updated iteratively with full dimensionality adapting to the complexity and distribution of the underlying data. This flexibility is particularly desirable for highly variable spatial frequencies. Experimental results demonstrate that the network representation is conducive to geometric data formulation and simplification, and therefore to manageable computation and compact storage.

Q. Meng, B. Li, N. Costen, H. Holstein

Stacking MF Networks to Combine the Outputs Provided by RBF Networks

The performance of a Radial Basis Functions network (RBF) can be increased with the use of an ensemble of RBF networks because the RBF networks are successfully applied to solve classification problems and they can be trained by gradient descent algorithms. Reviewing the bibliography we can see that the performance of ensembles of Multilayer Feedforward (MF) networks can be improved by the use of the two combination methods based on

Stacked Generalization

described in [1]. We think that we could get a better classification system if we applied these combiners to an

RBF

ensemble. In this paper we satisfactory apply these two new methods,

Stacked

and

Stacked+

, on ensembles of

RBF

networks. Increasing the number of networks used in the combination module is also successfully proposed in this paper. The results show that training 3

networks to combine an RBF ensemble is the best alternative.

Joaquín Torres-Sospedra, Carlos Hernández-Espinosa, Mercedes Fernández-Redondo

Neural Network Processing for Multiset Data

This paper introduces the notion of the

variadic neural network

(VNN). The inputs to a variadic network are an arbitrary-length list of

-tuples of real numbers, where

is fixed. In contrast to a recurrent network which processes a list sequentially, typically being affected more by more recent list elements, a variadic network processes the list simultaneously and is affected equally by all list elements. Formally speaking, the network can be seen as instantiating a function on a

multiset

along with a member of that multiset. I describe a simple implementation of a variadic network architecture, the

multi-layer variadic perceptron

(MLVP), and present experimental results showing that such a network can learn various variadic functions by back-propagation.

Simon McGregor

The Introduction of Time-Scales in Reservoir Computing, Applied to Isolated Digits Recognition

Reservoir Computing (RC) is a recent research area, in which a untrained recurrent network of nodes is used for the recognition of temporal patterns. Contrary to Recurrent Neural Networks (RNN), where the weights of the connections between the nodes are trained, only a linear output layer is trained. We will introduce three different time-scales and show that the performance and computational complexity are highly dependent on these time-scales. This is demonstrated on an isolated spoken digits task.

Benjamin Schrauwen, Jeroen Defour, David Verstraeten, Jan Van Campenhout

Partially Activated Neural Networks by Controlling Information

In this paper, we propose partial activation to simplify complex neural networks. For choosing important elements in a network, we develop a fully supervised competitive learning that can deal with any targets. This approach is an extension of competitive learning to a more general one, including supervised learning. Because competitive learning focuses on an important competitive unit, all the other competitive units are of no use. Thus, the number of connection weights to be updated can be reduced to a minimum point when we use competitive learning. We apply the method to the XOR problem to show that learning is possible with good interpretability of internal representations. Then, we apply the method to a student survey. In the problem, we try to show that the new method can produce connection weights that are more stable than those produced by BP. In addition, we show that, though connection weights are quite similar to those produced by linear regression analysis, generalization performance can be improved by changing the number of competitive units.

Ryotaro Kamimura

CNN Based Hole Filler Template Design Using Numerical Integration Techniques

This paper presents, a design method for the template of a hole-filler used to improve the pe rformance of the handwritten character recognition using numerical integration algorithms, based on the dynamic analysis of a cellular neural network (CNN). This is done by analyzing the features of the hole-filler template and the dynamic process of CNN using popular numerical integration algorithms to obtain a set of inequalities satisfying its output characteristics as well as the parameter range of the hole-filler template. Simulation results are presented for Euler, Modified Euler and RK methods and compared. It was found that RK Method performs well in terms of settling time and computation time for all step sizes.

K. Murugesan, P. Elango

Impact of Shrinking Technologies on the Activation Function of Neurons

Artificial neural networks are able to solve a great variety of different applications, e.g. classification or approximation tasks. To utilize their advantages in technical systems various hardware realizations do exist. In this work, the impact of shrinking device sizes on the activation function of neurons is investigated with respect to area demands, power consumption and the maximum resolution in their information processing. Furthermore, analog and digital implementations are compared in emerging silicon technologies beyond 100 nm feature size.

Ralf Eickhoff, Tim Kaulmann, Ulrich Rückert

Rectangular Basis Functions Applied to Imbalanced Datasets

Rectangular Basis Functions Networks (RecBFN) come from RBF Networks, and are composed by a set of Fuzzy Points which describe the network. In this paper, a set of characteristics of the RecBF are proposed to be used in imbalanced datasets, especially the order of the training patterns. We will demonstrate that it is an important factor to improve the generalization of the solution, which is the main problem in imbalanced datasets. Finally, this solution is compared with other important methods to work with imbalanced datasets, showing our method works well with this type of datasets and that an understandable set of rules can be extracted.

Vicenç Soler, Marta Prim

Qualitative Radial Basis Function Networks Based on Distance Discretization for Classification Problems

This paper presents a radial basis function neural network which leads to classifiers of lower complexity by using a qualitative radial function based on distance discretization. The proposed neural network model generates smaller solutions for a similar generalization performance, rising to classifiers with reduced complexity in the sense of fewer radial basis functions. Classification experiments on real world data sets show that the number of radial basis functions can be reduced in some cases significantly without affecting the classification accuracy.

Xavier Parra, Andreu Català

A Control Approach to a Biophysical Neuron Model

In this paper we present a neuron model based on the description of biophysical mechanisms combined with a regulatory mechanism from control theory. The aim of this work is to provide a neuron model that is capable of describing the main features of biological neurons such as maintaining an equilibrium potential using the NaK-ATPase and the generation of action potentials as well as to provide an estimation of the energy consumption of a single cell in a) quiescent mode (or equilibrium state) and b) firing state, when excited by other neurons. The same mechanism has also been used to model the synaptic excitation used in the simulated system.

Tim Kaulmann, Axel Löffler, Ulrich Rückert

Integrate-and-Fire Neural Networks with Monosynaptic-Like Correlated Activity

To study the physiology of the central nervous system it is necessary to understand the properties of the neural networks that integrate it and conform its functional substratum. Modeling and simulation of neural networks allow us to face this problem and consider it from the point of view of the analysis of activity correlation between pairs of neurons. In this paper, we define an optimized integrate-and-fire model of the simplest network possible, the monosynaptic circuit, and we raise the problem of searching for alternative non-monosynaptic circuits that generate monosynaptic-like correlated activity. For this purpose, we design an evolutionary algorithm with a crossover-with-speciation operator that works on populations of neural networks. The optimization of the neuronal model and the concurrent execution of the simulations allow us to efficiently cover the search space to finally obtain networks with monosynaptic-like correlated activity.

Héctor Mesa, Francisco J. Veredas

Multi-dimensional Recurrent Neural Networks

Recurrent neural networks (RNNs) have proved effective at one dimensional sequence learning tasks, such as speech and online handwriting recognition. Some of the properties that make RNNs suitable for such tasks, for example robustness to input warping, and the ability to access contextual information, are also desirable in multi-dimensional domains. However, there has so far been no direct way of applying RNNs to data with more than one spatio-temporal dimension. This paper introduces multi-dimensional recurrent neural networks, thereby extending the potential applicability of RNNs to vision, video processing, medical imaging and many other areas, while avoiding the scaling problems that have plagued other multi-dimensional models. Experimental results are provided for two image segmentation tasks.

Alex Graves, Santiago Fernández, Jürgen Schmidhuber

FPGA Implementation of an Adaptive Stochastic Neural Model

In this paper a FPGA implementation of a novel neural stochastic model for solving constrained NP-hard problems is proposed and developed. The hardware implementation allows to obtain high computation speed by exploiting parallelism, as the neuron update and the constraint violation check phases can be performed simultaneously.

The neural system has been tested on random and benchmark graphs, showing good performance with respect to the same heuristic for the same problems. Furthermore, the computational speed of the FPGA implementation has been measured and compared to software implementation. The developed architecture features dramatically faster computations with respect to the software implementation, even adopting a low-cost FPGA chip.

Giuliano Grossi, Federico Pedersini

Neural Dynamics and Complex Systems

Global Robust Stability of Competitive Neural Networks with Continuously Distributed Delays and Different Time Scales

The dynamics of cortical cognitive maps developed by self-organization must include the aspects of long and short-term memory. The behavior of such a neural network is characterized by an equation of neural activity as a fast phenomenon and an equation of synaptic modification as a slow part of the neural system, besides, this model bases on unsupervised synaptic learning algorithm. In this paper, using theory of the topological degree and strict Liapunov functional methods, we prove existence and uniqueness of the equilibrium of competitive neural networks with continuously distributed delays and different time scales, and present some new criteria for its global robust stability.

Yonggui Kao, QingHe Ming

Nonlinear Dynamics Emerging in Large Scale Neural Networks with Ontogenetic and Epigenetic Processes

We simulated a large scale spiking neural network characterized by an initial developmental phase featuring cell death driven by an excessive firing rate, followed by the onset of spike-timing-dependent synaptic plasticity (STDP), driven by spatiotemporal patterns of stimulation. The network activity stabilized such that recurrent preferred firing sequences appeared along the STDP phase. The analysis of the statistical properties of these patterns give hints to the hypothesis that a neural network may be characterized by a particular state of an underlying dynamical system that produces recurrent firing patterns.

Javier Iglesias, Olga K. Chibirova, Alessandro E. P. Villa

Modeling of Dynamics Using Process State Projection on the Self Organizing Map

In this paper, an approach to model the dynamics of multivariable processes based on the motion analysis of the process state trajectory is presented. The trajectory followed by the projection of the process state onto the 2D neural lattice of a Self-Organizing Map (SOM) is used as the starting point of the analysis. In a first approach, a coarse grain cluster-level model is proposed to identify the possible transitions among process operating conditions (clusters). Alternatively, in a finer grain neuron-level approach, a SOM neural network whose inputs are 6-dimensional vectors which encode the trajectory (T-SOM), is defined in a top level, where the KR-SOM, a generalization of the SOM algorithm to the continuous case, is used in the bottom level for continuous trajectory generation in order to avoid the problems caused in trajectory analysis by the discrete nature of SOM. Experimental results on the application of the proposed modeling method to supervise a real industrial plant are included.

Juan J. Fuertes-Martínez, Miguel A. Prada, Manuel Domínguez-González, Perfecto Reguera-Acevedo, Ignacio Díaz-Blanco, Abel A. Cuadrado-Vega

Fixed Points of the Abe Formulation of Stochastic Hopfield Networks

The stability of stochastic Hopfield neural networks, in the Abe formulation, is studied. The aim is to determine whether the ability of the deterministic system to solve combinatorial optimization problems is preserved after the addition of random noise. In particular, the stochastic stability of the attractor set is analyzed: vertices, which are feasible points of the problem, should be stable, whereas interior points, which are unfeasible, should be unstable. Conditions on the noise intensity are stated, so that these properties are guaranteed. This theoretical investigation establishes the foundations for practical application of stochastic networks to combinatorial optimization.

Marie Kratz, Miguel Atencia, Gonzalo Joya

Visualization of Dynamics Using Local Dynamic Modelling with Self Organizing Maps

In this work, we describe a procedure to visualize nonlinear process dynamics using a self-organizing map based local model dynamical estimator. The proposed method exploits the topology preserving nature of the resulting estimator to extract visualizations (planes) of insightful dynamical features, that allow to explore nonlinear systems whose behavior changes with the operating point. Since the visualizations are obtained from a dynamical model of the process, measures on the goodness of this estimator (such as RMSE or AIC) are also applicable as a measure of the trustfulness of the visualizations. To illustrate the application of the proposed method, an experiment to analyze the dynamics of a nonlinear system on different operating points is included.

Ignacio Díaz-Blanco, Abel A. Cuadrado-Vega, Alberto B. Diez-González, Juan J. Fuertes-Martánez, Manuel Domínguez-González, Perfecto Reguera-Acevedo

Comparison of Echo State Networks with Simple Recurrent Networks and Variable-Length Markov Models on Symbolic Sequences

A lot of attention is now being focused on connectionist models known under the name “reservoir computing”. The most prominent example of these approaches is a recurrent neural network architecture called an echo state network (ESN). ESNs were successfully applied in more real-valued time series modeling tasks and performed exceptionally well. Also using ESNs for processing symbolic sequences seems to be attractive. In this work we experimentally support the claim that the state space of ESN is organized according to the Markovian architectural bias principles when processing symbolic sequences. We compare performance of ESNs with connectionist models explicitly using Markovian architectural bias property, with variable length Markov models and with recurrent neural networks trained by advanced training algorithms. Moreover we show that the number of reservoir units plays a similar role as the number of contexts in variable length Markov models.

Michal Čerňanský, Peter Tiňo

Data Analysis

Data Fusion and Auto-fusion for Quantitative Structure-Activity Relationship (QSAR)

Data fusion originally referred to the process of combining multi-sensor data from different sources such that the resulting information/model is in some sense better than would be possible when these sources where used individually. In this paper the data fusion concept is extended to molecular drug design. Rather than using data from different sensor sources, different descriptor sets are used to predict activities or responses for a set of molecules. Data fusion techniques are applied in order to improve the predictive (QSAR) model on test data. In this case this type of data fusion is referred to as auto-fusion. An effective auto-fusion functional model and alternative architectures are proposed for a predictive molecular design or QSAR model to model and predict the binding affinity to the human serum albumin.

Changjian Huang, Mark J. Embrechts, N. Sukumar, Curt M. Breneman

Cluster Domains in Binary Minimization Problems

Previously it was found that when minimizing a quadratic functional depending on a great number of binary variables, it is reasonable to use aggregated variables, joining together independent binary variables in blocks (domains). Then one succeeds in finding deeper local minima of the functional. In the present publication we investigate an algorithm of the domains formation based on the clustering of the connection matrix.

Leonid B. Litinskii

MaxSet: An Algorithm for Finding a Good Approximation for the Largest Linearly Separable Set

Finding the largest linearly separable set of examples for a given Boolean function is a NP-hard problem, that is relevant to neural network learning algorithms and to several problems that can be formulated as the minimization of a set of inequalities. We propose in this work a new algorithm that is based on finding a unate subset of the input examples, with which then train a perceptron to find an approximation for the largest linearly separable subset. The results from the new algorithm are compared to those obtained by the application of the Pocket learning algorithm directly with the whole set of inputs, and show a clear improvement in the size of the linearly separable subset obtained, using a large set of benchmark functions.

Leonardo Franco, José Luis Subirats, José M. Jerez

Generalized Softmax Networks for Non-linear Component Extraction

We develop a probabilistic interpretation of non-linear component extraction in neural networks that activate their hidden units according to a softmax-like mechanism. On the basis of a generative model that combines hidden causes using the max -function, we show how the extraction of input components in such networks can be interpreted as maximum likelihood parameter optimization. A simple and neurally plausible Hebbian

-rule is derived. For approximately-optimal learning, the activity of the hidden neural units is described by a generalized softmax function and the classical softmax is recovered for very sparse input. We use the bars benchmark test to numerically verify our analytical results and to show competitiveness of the derived learning algorithms.

Jörg Lücke, Maneesh Sahani

Stochastic Weights Reinforcement Learning for Exploratory Data Analysis

We review a new form of immediate reward reinforcement learning in which the individual unit is deterministic but has stochastic synapses. 4 learning rules have been developed from this perspective and we investigate the use of these learning rules to perform linear projection techniques such as principal component analysis, exploratory projection pursuit and canonical correlation analysis. The method is very general and simply requires a reward function which is specific to the function we require the unit to perform. We also discuss how the method can be used to learn kernel mappings and conclude by illustrating its use on a topology preserving mapping.

Ying Wu, Colin Fyfe, Pei Ling Lai

Post Nonlinear Independent Subspace Analysis

In this paper a generalization of Post Nonlinear Independent Component Analysis (PNL-ICA) to Post Nonlinear Independent Subspace Analysis (PNL-ISA) is presented. In this framework sources to be identified can be multidimensional as well. For this generalization we prove a separability theorem: the ambiguities of this problem are essentially the same as for the linear Independent Subspace Analysis (ISA). By applying this result we derive an algorithm using the mirror structure of the mixing system. Numerical simulations are presented to illustrate the efficiency of the algorithm.

Zoltán Szabó, Barnabás Póczos, Gábor Szirtes, András Lőrincz

Estimation

Algebraic Geometric Study of Exchange Monte Carlo Method

In hierarchical learning machines such as neural networks, Bayesian learning provides better generalization performance than maximum likelihood estimation. However, its accurate approximation using Markov chain Monte Carlo (MCMC) method requires huge computational cost. The exchange Monte Carlo (EMC) method was proposed as an improved algorithm of MCMC method. Although its effectiveness has been shown not only in Bayesian learning but also in many fields, the mathematical foundation of EMC method has not yet been established. In this paper, we clarify the asymptotic behavior of symmetrized Kullback divergence and average exchange ratio, which are used as criteria for designing the EMC method.

Kenji Nagata, Sumio Watanabe

Solving Deep Memory POMDPs with Recurrent Policy Gradients

This paper presents Recurrent Policy Gradients, a model-free reinforcement learning (RL) method creating limited-memory sto-chastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a “Long Short-Term Memory” architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task.

Daan Wierstra, Alexander Foerster, Jan Peters, Jürgen Schmidhuber

Soft Clustering for Nonparametric Probability Density Function Estimation

We present a nonparametric probability density estimation model. The classical Parzen window approach builds a spherical Gaussian density around every input sample. Our method has a first stage where hard neighbourhoods are determined for every sample. Then soft clusters are considered to merge the information coming from several hard neighbourhoods. Our proposal estimates the local principal directions to yield a specific Gaussian mixture component for each soft cluster. This leads to outperform other proposals where local parameter selection is not allowed and/or there are no smoothing strategies, like the manifold Parzen windows.

Ezequiel López-Rubio, Juan Miguel Ortiz-de-Lazcano-Lobato, Domingo López-Rodríguez, María del Carmen Vargas-González

Vector Field Approximation by Model Inclusive Learning of Neural Networks

The problem of vector field approximation arises in the wide range of fields such as motion control, computer vision and so on. This paper proposes a method for reconstructing an entire continuous vector field from a sparse set of sample data by training neural networks. In order to make approximation results possess inherent properties of vector fields and to attain reasonable approximation accuracy with computational efficiency, we include a priori knowledge on inherent properties of vector fields into the learning problem of neural networks, which we call model inclusive learning. An efficient learning algorithm of neural networks is derived. It is shown through numerical experiments that the proposed method makes it possible to reconstruct vector fields accurately and efficiently.

Yasuaki Kuroe, Hajimu Kawakami

Spectral Measures for Kernel Matrices Comparison

With the emergence of data fusion techniques (kernel combinations, ensemble methods and boosting algorithms), the task of comparing distance/similarity/kernel matrices is becoming increasingly relevant. However, the choice of an appropriate metric for matrices involved in pattern recognition problems is far from trivial.

In this work we propose a general spectral framework to build metrics for matrix spaces. Within the general framework of matrix pencils, we propose a new metric for symmetric and semi-positive definite matrices, called Pencil Distance (PD). The generality of our approach is demonstrated by showing that the Kernel Alignment (KA) measure is a particular case of our spectral approach.

We illustrate the performance of the proposed measures using some classification problems.

Javier González, Alberto Muñoz

A Novel and Efficient Method for Testing Non Linear Separability

The notion of linear separability is widely used in machine learning research. Learning algorithms that use this concept to learn include neural networks (Single Layer Perceptron and Recursive Deterministic Perceptron), and kernel machines (Support Vector Machines). Several algorithms for testing linear separability exist. Some of these methods are computationally intense. Also, several of them will converge if the classes are linearly separable, but will fail to converge otherwise. A fast and efficient test for non linear separability is proposed which can be used to pretest classification data sets for non linear separability thus avoiding expensive computations. This test is based on the convex hull separability method but does not require the computation of the convex hull.

David Elizondo, Juan Miguel Ortiz-de-Lazcano-Lobato, Ralph Birkenhead

A One-Step Unscented Particle Filter for Nonlinear Dynamical Systems

This paper proposes a one-step unscented particle filter for accurate nonlinear estimation. Its design involves the elaboration of a reliable one-step unscented filter that draws state samples deterministically for doing both the time and measurement updates, without linearization of the observation model. Empirical investigations show that the one-step unscented particle filter compares favourably to relevant filters on nonlinear dynamic systems modelling.

Nikolay Y. Nikolaev, Evgueni Smirnov

Spatial and Spatio-Temporal Learning

Spike-Timing-Dependent Synaptic Plasticity to Learn Spatiotemporal Patterns in Recurrent Neural Networks

Assuming asymmetric time window of the spike-timing- dependent synaptic plasticity (STDP), we study spatiotemporal learning in recurrent neural networks. We first show numerical simulations of spiking neural networks in which spatiotemporal Poisson patterns (i.e., random spatiotemporal patterns generated by independent Poisson process) are successfully memorized by the STDP-based learning rule. Then, we discuss the underlying mechanism of the STDP-based learning, mentioning our recent analysis on associative memory analog neural networks for periodic spatiotemporal patterns. Order parameter dynamics in the analog neural networks explains time scale change in retrieval process and the shape of the STDP time window optimal to encode a large number of spatiotemporal patterns. The analysis further elucidates phase transition due to destabilization of retrieval state. These findings on analog neural networks are found to be consistent with the previous results on spiking neural networks. These STDP-based spatiotemporal associative memory possibly gives some insights into the recent experimental results in which spatiotemporal patterns are found to be retrieved at the various time scale.

Masahiko Yoshioka, Silvia Scarpetta, Maria Marinaro

A Distributed Message Passing Algorithm for Sensor Localization

We propose a fully distributed message passing algorithm based on expectation propagation for the purpose of sensor localization. Sensors perform noisy measurements of their mutual distances and their relative angles. These measurements form the basis for an iterative, local (i.e. distributed) algorithm to compute the sensor’s locations including uncertainties for these estimates. This approach offers a distributed, computationally efficient and flexible framework for information fusion in sensor networks.

Max Welling, Joseph J. Lim

An Analytical Model of Divisive Normalization in Disparity-Tuned Complex Cells

Based on the energy model for disparity-tuned neurons, we calculate probability density functions of complex cell activity for random-dot stimuli. We investigate the effects of normalization and give analytical expressions for the disparity tuning curve and its variance. We show that while normalized and non-normalized complex cells have similar tuning curves, the variance is significantly lower for normalized complex cells, which makes disparity estimation more reliable. The results of the analytical calculations are compared to computer simulations.

Wolfgang Stürzl, Hanspeter A. Mallot, A. Knoll

Evolutionary Computing

Automatic Design of Modular Neural Networks Using Genetic Programming

Traditional trial-and-error approach to design neural networks is time consuming and does not guarantee yielding the best neural network feasible for a specific application. Therefore automatic approaches have gained more importance and popularity. In addition, traditional (non-modular) neural networks can not solve complex problems since these problems introduce wide range of overlap which, in turn, causes a wide range of deviations from efficient learning in different regions of the input space, whereas a modular neural network attempts to reduce the effect of these problems via a divide and conquer approach. In this paper we are going to introduce a different approach to autonomous design of modular neural networks. Here we use genetic programming for automatic modular neural networks design; their architectures, transfer functions and connection weights. Our approach offers important advantages over existing methods for automated neural network design. First it prefers smaller modules to bigger modules, second it allows neurons even in the same layer to use different transfer functions, and third it is not necessary to convert each individual into a neural network to obtain the fitness value during the evolution process. Several tests were performed with problems based on some of the most popular test databases. Results show that using genetic programming for automatic design of neural networks is an efficient method and is comparable with the already existing techniques.

Naser NourAshrafoddin, Ali R. Vahdat, Mohammad Mehdi Ebadzadeh

Blind Matrix Decomposition Via Genetic Optimization of Sparseness and Nonnegativity Constraints

Nonnegative Matrix Factorization (NMF) has proven to be a useful tool for the analysis of nonnegative multivariate data. However, it is known not to lead to unique results when applied to nonnegative Blind Source Separation (BSS) problems. In this paper we present first results of an extension to the NMF algorithm which solves the BSS problem when the underlying sources are sufficiently sparse. As the proposed target function is discontinuous and possesses many local minima, we use a genetic algorithm for its minimization. Application to a microarray data set will be considered also.

Kurt Stadlthanner, Fabian J. Theis, Elmar W. Lang, Ana Maria Tomé, Carlos G. Puntonet

Meta Learning, Agents Learning

Meta Learning Intrusion Detection in Real Time Network

With the rapid increase in connectivity and accessibility of computer systems over the internet which has resulted in frequent opportunities for intrusions and attacks, intrusion detection on the network has become a crucial issue for computer system security. Methods based on hand-coded rule sets are laborous to build and not very reliable. This problem has led to an increasing interest in intrusion detection techniques based upon machine learning or data mining. However, traditional data mining based intrusion detection systems use single classifier in their detection engines. In this paper, we propose a meta learning based method for intrusion detection by MultiBoosting multi classifiers. MultiBoosting can form decision committees by combining AdaBoost with wagging. It is able to harness both AdaBoost’s high bias and variance reduction with wagging’s superior variance reduction. Experiments results show that MultiBoosting can improve the detection performance of state-of-art machine learning based intrusion detection techniques. Furthermore, we present a Symmetrical Uncertainty (SU) based method for reducing network connection features to make MultiBoosting more efficient in real-time network environment, in the meanwhile, keep the detection performance unundermined and in some cases, even further improved.

Rongfang Bie, Xin Jin, Chuanliang Chen, Chuan Xu, Ronghuai Huang

Active Learning to Support the Generation of Meta-examples

Meta-Learning has been used to select algorithms based on the features of the problems being tackled. Each training example in this context, i.e. each meta-example, stores the features of a given problem and the performance information obtained by the candidate algorithms in the problem. The construction of a set of meta-examples may be costly, since the algorithms performance is usually defined through an empirical evaluation on the problem at hand. In this context, we proposed the use of Active Learning to select only the relevant problems for meta-example generation. Hence, the need for empirical evaluations of the candidate algorithms is reduced. Experiments were performed using the classification uncertainty of the k-NN algorithm as the criteria for active selection of problems. A significant gain in performance was yielded by using the Active Learning method.

Ricardo Prudêncio, Teresa Ludermir

Co-learning and the Development of Communication

We investigate the properties of coupled co-learning systems during the emergence of communication. Co-learning systems are more complex than individual learning systems because of being dependent on the learning process of each other, thus risking divergence. We developed a neural network approach and implemented a concept that we call reconstruction principle, which we found adequate for overcoming the instability problem. Experimental simulations were performed to test the emergence of both compositional and holistic communication. The results show that compositional communication is favorable when learning performance is considered, however it is more error-prone to differences in the conceptual representations of the individual systems. We show that our architecture enables the adjustment of the differences in the individual representations in case of compositional communication.

Viktor Gyenes, András Lőrincz

Complex-Valued Neural Networks (Special Session)

Models of Orthogonal Type Complex-Valued Dynamic Associative Memories and Their Performance Comparison

Associative memories are one of the popular applications of neural networks and several studies on their extension to the complex domain have been done. One of the important factors to characterize behavior of a complex-valued neural network is its activation function which is a nonlinear complex function. In complex-valued neural networks, there are several possibilities in choosing an activation function because of a wide variety of complex functions. This paper proposes three models of orthogonal type dynamic associative memories using complex-valued neural networks with three different activation functions. We investigate their behavior as associative memories theoretically. Comparisons are also made among these three models in terms of dynamics and storage capabilities.

Yasuaki Kuroe, Yuriko Taniguchi

Dynamics of Discrete-Time Quaternionic Hopfield Neural Networks

We analyze a discrete-time quaternionic Hopfield neural network with continuous state variables updated asynchronously. The state of a neuron takes quaternionic value which is four-dimensional hypercomplex number. Two types of the activation function for updating neuron states are introduced and examined. The stable states of the networks are demonstrated through an example of small network.

Teijiro Isokawa, Haruhiko Nishimura, Naotake Kamiura, Nobuyuki Matsui

Neural Learning Algorithms Based on Mappings: The Case of the Unitary Group of Matrices

Neural learning algorithms based on optimization on manifolds differ by the way the single learning steps are effected on the neural system’s parameter space. In this paper, we present a class counting four neural learning algorithms based on the differential geometric concept of mappings from the tangent space of a manifold to the manifold itself. A learning stepsize adaptation theory is proposed as well under the hypothesis of additiveness of the learning criterion. The numerical performances of the discussed algorithms are illustrated on a learning task and are compared to a reference algorithm known from literature.

Simone Fiori

Optimal Learning Rates for Clifford Neurons

Neural computation in Clifford algebras, which include familiar complex numbers and quaternions as special cases, has recently become an active research field. As always, neurons are the atoms of computation. The paper provides a general notion for the Hessian matrix of Clifford neurons of an arbitrary algebra. This new result on the dynamics of Clifford neurons then allows the computation of optimal learning rates. A thorough discussion of error surfaces together with simulation results for different neurons is also provided. The presented contents should give rise to very efficient second–order training methods for Clifford Multi-layer perceptrons in the future.

Sven Buchholz, Kanta Tachibana, Eckhard M. S. Hitzer

Solving Selected Classification Problems in Bioinformatics Using Multilayer Neural Network Based on Multi-Valued Neurons (MLMVN)

A multilayer neural network based on multi-valued neurons (MLMVN) is a new powerful tool for solving classification, recognition and prediction problems. This network has a number of specific properties and advantages that follow from the nature of a multi-valued neuron (complex-valued weights and inputs/outputs lying on the unit circle). Its backpropagation learning algorithm is derivative-free. The learning process converges very quickly, and the learning rate for all neurons is self-adaptive. The functionality of the MLMVN is higher than the one of the traditional feedforward neural networks and a variety of kernel-based networks. Its higher flexibility and faster adaptation to the mapping implemented make it possible to solve complex classification problems using a simpler network. In this paper, we show that the MLMVN can be successfully used for solving two selected classification problems in bioinformatics.

Igor Aizenberg, Jacek M. Zurada

Error Reduction in Holographic Movies Using a Hybrid Learning Method in Coherent Neural Networks

Computer Generated Holograms (CGHs) are commonly used in optical tweezers which are employed in various research fields. Frame interpolation using coherent neural networks (CNNs) based on correlation learning can be used to generate holographic movies efficiently. However, the error that appears in the interpolated CGH images need to be reduced even further so that the method with frame interpolation can be accepted for use generally. In this paper, we propose a new hybrid CNN learning method that is able to generate the movies almost just as efficiently and yet reduces even more error that is present in the generated holographic images as compared to the method based solely on correlation learning.

Chor Shen Tay, Ken Tanizawa, Akira Hirose

Temporal Synchronization and Nonlinear Dynamics in Neural Networks (Special Session)

Sparse and Transformation-Invariant Hierarchical NMF

The hierarchical non-negative matrix factorization (HNMF) is a multilayer generative network for decomposing strictly positive data into strictly positive activations and base vectors in a hierarchical manner. However, the standard hierarchical NMF is not suited for overcomplete representations and does not code efficiently for transformations in the input data. Therefore we extend the standard HNMF by sparsity conditions and transformation-invariance in a natural, straightforward way. The idea is to factorize the input data into several hierarchical layers of activations, base vectors and transformations under sparsity constraints, leading to a less redundant and sparse encoding of the input data.

Sven Rebhan, Julian Eggert, Horst-Michael Groß, Edgar Körner

Zero-Lag Long Range Synchronization of Neurons Is Enhanced by Dynamical Relaying

How can two distant neural assemblies synchronize their firings at zero-lag even in the presence of non-negligible delays in the transfer of information between them? Here we propose a simple network module that naturally accounts for zero-lag neural synchronization for a wide range of temporal delays. In particular, we demonstrate that isochronous (without lag) millisecond precise synchronization between two distant neurons or neural populations can be achieved by relaying their dynamics via a third mediating single neuron or population.

Raul Vicente, Gordon Pipa, Ingo Fischer, Claudio R. Mirasso

Polynomial Cellular Neural Networks for Implementing the Game of Life

One-layer space-invariant Cellular Neural Networks (CNNs) are widely appreciated for their simplicity and versatility; however, such structures are not able to solve non-linearly separable problems. In this paper we show that a polynomial CNN - that has with a direct VLSI implementation - is capable of dealing with the ‘Game of Life’, a Cellular Automaton with the same computational complexity as a Turing machine. Furthermore, we describe a simple design algorithm that allows to convert the rules of a Cellular Automaton into the weights of a polynomial CNN.

Giovanni Egidio Pazienza, Eduardo Gomez-Ramirez, Xavier Vilasís-Cardona

Deterministic Nonlinear Spike Train Filtered by Spiking Neuron Model

Deterministic nonlinear dynamics has been observed in experimental electrophysiological recordings performed in several areas of the brain. However, little is known about the ability to transmit a complex temporally organized activity through different types of spiking neurons. This study investigates the response of a spiking neuron model representing three archetypical types (regular spiking, thalamo-cortical and resonator) to input spike trains composed of deterministic (chaotic) and stochastic processes with weak background activity. The comparison of the input and output spike trains allows to assess the transmission of information contained in the deterministic nonlinear dynamics. The pattern grouping algorithm (PGA) was applied to the output of the neuron to detect the dynamical attractor embedded in the original input spike train. The results show that the model of the thalamo-cortical neuron can be a better candidate than regular spiking and resonator type neurons in transmitting temporal information in a spatially organized neural network.

Yoshiyuki Asai, Takashi Yokoi, Alessandro E. P. Villa

The Role of Internal Oscillators for the One-Shot Learning of Complex Temporal Sequences

We present an artificial neural network used to learn online complex temporal sequences of gestures to a robot. The system is based on a simple temporal sequences learning architecture, neurobiological inspired model using some of the properties of the cerebellum and the hippocampus, plus a diversity generator composed of CTRNN oscillators. The use of oscillators allows to remove the ambiguity of complex sequences. The associations with oscillators allow to build an internal state to disambiguate the observable state. To understand the effect of this learning mechanism, we compare the performance of (i) our model with (ii) simple sequence learning model and with (iii) the simple sequence learning model plus a competitive mechanism between inputs and oscillators. Finally, we present an experiment showing a AIBO robot, which learns and reproduces a sequence of gestures.

Matthieu Lagarde, Pierre Andry, Philippe Gaussier

Clustering Limit Cycle Oscillators by Spectral Analysis of the Synchronisation Matrix with an Additional Phase Sensitive Rotation

Synchrony is a phenomenon present in many complex systems of coupled oscillators. It is often important to cluster those systems into subpopulations of oscillators, and characterise the interactions therein. This article derives the clustering information, based on an eigenvalue decomposition of the complex synchronisation matrix. A phase sensitive post-rotation is proposed, to separate classes of oscillators with similar frequencies, but with no physical interaction.

Jan-Hendrik Schleimer, Ricardo Vigário

Control and Synchronization of Chaotic Neurons Under Threshold Activated Coupling

We have studied the spatiotemporal behaviour of threshold coupled chaotic neurons. We observe that the chaos is controlled by threshold activated coupling, and the system yields synchronized temporally periodic states under the threshold response. Varying the frequency of thresholding provides different higher order periodic behaviors, and can serve as a simple mechanism for stabilising a large range of regular temporal patterns in chaotic systems. Further, we have obtained a transition from spatiotemporal chaos to fixed spatiotemporal profiles, by lengthening the relaxation time scale.

Manish Dev Shrimali, Guoguang He, Sudeshna Sinha, Kazuyuki Aihara

Neuronal Multistability Induced by Delay

Feedback circuits are important for understanding the emergence of patterns of neural activity. In this contribution we study how a delayed circuit representing a recurrent synaptic connection interferes with neuronal nonlinear dynamics. The neuron is modeled using a Hodgkin-Huxley type model in which the firing pattern depends on subthreshold oscillations, and the feedback is included as a time delayed linear term in the membrane voltage equation. In the regime of subthreshold oscillations the feedback amplifies the oscillation amplitude, inducing threshold crossings and firing activity that is self regularized by the delay. We also study a small neuron ensemble globally coupled through the delayed mean field. We find that the firing pattern is controlled by the delay. Depending on the delay, either all neurons fire spikes, or they all exhibit subthreshold activity, or the ensemble divides into clusters, with some neurons displaying subthreshold activity while others fire spikes.

Cristina Masoller, M. C. Torrent, Jordi García-Ojalvo

Backmatter

Titel: Artificial Neural Networks – ICANN 2007
herausgegeben von: Joaquim Marques de Sá
Luís A. Alexandre
Włodzisław Duch
Danilo Mandic
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-74690-4
Print ISBN: 978-3-540-74689-8
DOI: https://doi.org/10.1007/978-3-540-74690-4