Skip to main content

2002 | Buch

Artificial Neural Networks — ICANN 2002

International Conference Madrid, Spain, August 28–30, 2002 Proceedings

herausgegeben von: José R. Dorronsoro

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

The International Conferences on Arti?cial Neural Networks, ICANN, have been held annually since 1991 and over the years have become the major European meeting in neural networks. This proceedings volume contains all the papers presented at ICANN 2002, the 12th ICANN conference, held in August 28– 30, 2002 at the Escuela T´ecnica Superior de Inform´atica of the Universidad Aut´onoma de Madrid and organized by its Neural Networks group. ICANN 2002 received a very high number of contributions, more than 450. Almost all papers were revised by three independent reviewers, selected among the more than 240 serving at this year’s ICANN, and 221 papers were ?nally selected for publication in these proceedings (due to space considerations, quite a few good contributions had to be left out). I would like to thank the Program Committee and all the reviewers for the great collective e?ort and for helping us to have a high quality conference.

Inhaltsverzeichnis

Frontmatter

Computational Neuroscience

Frontmatter
A Neurodynamical Theory of Visual Attention: Comparisons with fMRI- and Single-Neuron Data

We describe a model of invariant visual object recognition in the brain that incorporates different brain areas of the dorsal or ‘where’ and ventral or ‘what’ paths of the visual cortex. The dorsal ‘where’ path is implemented in the model by feedforward and feedback connections between brain areas V1, V2 and a PP module. The ventral ‘what’ path is implemented in a physiologically plausible four-layer network, corresponding to brain areas V1, V2, V4 and IT, with convergence to each part of a layer from a small region of the preceding layer, with feature-based attentional feedback connections, and with local competition between the neurons within a layer implemented by local lateral inhibition. In particular, the model explains the gradually increasing magnitude of the attentional modulation that is found in fMRI experiments from earlier visual areas (V1, V2) to higher ventral visual areas (V4, IT). The model also shows how the effective size of the receptive fields of IT neurons becomes smaller in natural cluttered scenes.

Gustavo Deco, Edmund Rolls
A Neural Model of Spatio Temporal Coordination in Prehension

The question of how the transport and grasp components in prehension are spatio-temporally coordinated is addressed in this paper. Based upon previous works by Castiello [1] we hypothesize that this coordination is carried out by neural networks in basal ganglia that exert a sophisticated gating / modulatory function over the two visuomotor channels that according to Jeannerod [2] and Arbib [3] are involved in prehension movement. Spatial dimension and temporal phasing of the movement are understood in terms of basic motor programs that are re-scaled both temporally and spatially by neural activity in basal ganglia thalamocortical loops. A computational model has been developed to accommodate all these assumptions. The model proposes an interaction between the two channels, that allows a distribution of cortical information related with arm transport channel, to the grasp channel. Computer simulations of the model reproduce basic kinematic features of prehension movement.

Javier Molina-Vilaplana, Jorge Feliu Batlle, Juan López Coronado
Stabilized Dynamics in Physiological and Neural Systems Despite Strongly Delayed Feedback

Interaction delays are ubiquitous in feedback systems due to finite signal conduction times. An example is the hippocampal feedback loop comprising excitatory pyramidal cells and inhibitory basket cells, where delays are introduced through synaptic, dendritic and axonal signal propagation. It is well known that in delayed recurrent systems complex periodic orbits and even chaos may occur. Here we study the case of distributed delays arising from diversity in transmission speed. Through stability considerations and numerical computations we show that feedback with distributed delays yields simpler behavior as compared to the singular delay case: oscillations may have a lower period or even be replaced by steady state behavior. The introduction of diversity in delay times may thus be a strategy to avoid complex and irregular behavior in systems where delayed regulation is unavoidable.

Andreas Thiel, Christian W. Eurich, Helmut Schwegler
Learning Multiple Feature Representations from Natural Image Sequences

Hierarchical neural networks require the parallel extraction of multiple features. This raises the question how a subpopulation of cells can become specific to one feature and invariant to another, while a different subpopulation becomes invariant to the first but specific to the second feature. Using a colour image sequence recorded by a camera mounted to a cat’s head, we train a population of neurons to achieve optimally stable responses. We find that colour sensitive cells emerge. Adding the additional objective of decorrelating the neurons’ outputs leads a subpopulation to develop achromatic receptive fields. The colour sensitive cells tend to be non-oriented, while the achromatic cells are orientation-tuned, in accordance with physiological findings. The proposed objective thus successfully separates cells which are specific for orientation and invariant to colour from orientation invariant colour cells.

Wolfgang Einhäuser, Christoph Kayser, Konrad P. Körding, Peter König
Analysis of Biologically Inspired Small-World Networks

Small-World networks are highly clusterized networks with small distances between their nodes. There are some well known biological networks that present this kind of connectivity. On the other hand, the usual models of Small-World networks make use of undirected and unweighted graphs in order to represent the connectivity between the nodes of the network. These kind of graphs cannot model some essential characteristics of neural networks as, for example, the direction or the weight of the synaptic connections. In this paper we analyze different kinds of directed graphs and show that they can also present a Small-World topology when they are shifted from regular to random. Also analytical expressions are given for the cluster coefficient and the characteristic path of these graphs.

Carlos Aguirre, Ramón Huerta, Fernando Corbacho, Pedro Pascual
Receptive Fields Similar to Simple Cells Maximize Temporal Coherence in Natural Video

Recently, statistical models of natural images have shown emergence of several properties of the visual cortex. Most models have considered the non-Gaussian properties of static image patches, leading to sparse coding or independent component analysis. Here we consider the basic statistical time dependencies of image sequences. We show that simple cell type receptive fields emerge when temporal response strength correlation is maximized for natural image sequences. Thus, temporal response strength correlation, which is a nonlinear measure of temporal coherence, provides an alternative to sparseness in modeling simple cell receptive field properties. Our results also suggest an interpretation of simple cells in terms of invariant coding principles that have previously been used to explain complex cell receptive fields.

Jarmo Hurri, Aapo Hyvärinen
Noise Induces Spontaneous Synchronous Aperiodic Activity in EI Neural Networks

We analyze the effect of noise on spontaneous activity of a excitatory-inhibitory neural network model. Analytically different regimes can be distinguished depending on the network parameters. In one of the regimes noise induces synchronous aperiodic oscillatory activity in the isolated network (regime B). The Coherent Stochastic Resonance phenomena occur. Activity is highly spatially correlated (synchrony), it’s oscillatory on short time scales and it decorrelates in time on long time scale (aperiodic). At zero noise the oscillatory activity vanishes in this regime. Changing parameters (for example increasing the excitatory-to-excitatory connection strength) we get spontaneous synchronous and periodic activity, even without noise (regime C). The model is in agreement with the measurements of spontaneous activity of two-dimensional cortical cell neural networks placed on multi-electrode arrays performed by Segev et.al [2].

Maria Marinaro, Silvia Scarpetta
Multiple Forms of Activity-Dependent Plasticity Enhance Information Transfer at a Dynamic Synapse

The information contained in the amplitude of the postsy-naptic response about the relative timing of presynaptic spikes is considered using a model dynamic synapse. We show that the combination of particular forms of facilitation and depression greatly enhances information transfer at the synapse for high frequency stimuli. These dynamic mechanisms do not enhance the information if present individually. The synaptic model closely matches the behaviour of the auditory system synapse, the calyx of Held, for which accurate transmission of the timing of high frequency presynaptic spikes is essential.

Bruce Graham
Storage Capacity of Kernel Associative Memories

This contribution discusses the thermodynamic phases and storage capacity of an extension of the Hopfield-Little model of associative memory via kernel functions. The analysis is presented for the case of polynomial and Gaussian kernels in a replica symmetry ansatz. As a general result we found for both kernels that the storage capacity increases considerably compared to the Hopfield-Little model.

B. Caputo, H. Niemann
Macrocolumns as Decision Units

We consider a cortical macrocolumn as a collection of inhibitorily coupled minicolumns of excitatory neurons and show that its dynamics is determined by a number of stationary points, which grows exponentially with the number of minicolumns. The stability of the stationary points is governed by a single parameter of the network, which determines the number of possibly active minicolumns. The dynamics symmetrizes the activity distributed among the active columns but if the parameter is increased, it forces this symmetry to break by switching off a minicolumn. If, for a state of maximal activity, the parameter is slowly increased the symmetry is successively broken until just one minicolumn remains active. During such a process minor differences between the inputs result in the activation of the minicolumn with highest input, a feature which shows that a macrocolumn can serve as decision and amplification unit for its inputs. We present a complete analysis of the dynamics along with computer simulations, which support the theoretical results.

Jörg Lücke, Christoph von der Malsburg, Rolf P. Würtz
Nonlinear Analysis of Simple Cell Tuning in Visual Cortex

We apply a recently developed approximation method to two standard models for orientation tuning: a one-layer model with difference-of-Gaussians connectivity and a two-layer excitatory-inhibitory network. Both models reveal identical steady states and instabilities to high firing rates. The two-field model can also loose stability through a Hopf-bifurcation that results in rhythmically modulated tuning widths around 0 to 50Hz. The network behavior is almost independent of the relative weights and widths of the kernels from excitatory to inhibitory cells and back. Formulas for tuning properties, instabilities, and oscillation frequencies are given.

Thomas Wennekers
Clustering within Integrate-and-Fire Neurons for Image Segmentation

An algorithm is developed to produce self-organisation of a purely excitatory network of Integrate-and-Fire (IF) neurons, receiving input from a visual scene. The work expands on a clustering algorithm, previously developed for Biological Oscillators, which self-organises similar oscillators into groups and then clusters these groups together. Pixels from an image are used as scalar inputs for the network, and segmented as the oscillating neurons are clustered into synchronised groups.

Phill Rowcliffe, Jianfeng Feng, Hilary Buxton
Symmetry Detection Using Global-Locally Coupled Maps

Symmetry detection through a net of coupled maps is proposed. Logistic maps are associated with each element of a pixel image where the symmetry is intended to be verified. The maps are locally and globally coupled and the reflection-symmetry structure can be embedded in local couplings. Computer simulations are performed by using random gray level images with different image sizes, asymmetry levels and noise intensity. The symmetry detection is also done under dynamic scene changing. Finally the extensions and the adherence of the present model to biological systems are discussed.

Rogério de Oliveira, Luiz Henrique Alves Monteiro
Applying Slow Feature Analysis to Image Sequences Yields a Rich Repertoire of Complex Cell Properties

We apply Slow Feature Analysis (SFA) to image sequences generated from natural images using a range of spatial transformations. An analysis of the resulting receptive fields shows that they have a rich spectrum of invariances and share many properties with complex and hypercomplex cells of the primary visual cortex. Furthermore, the dependence of the solutions on the statistics of the transformations is investigated.

Pietro Berkes, Laurenz Wiskott
Combining Multimodal Sensory Input for Spatial Learning

For robust self-localisation in real environments autonomous agents must rely upon multimodal sensory information. The relative importance of a sensory modality is not constant during the agent-environment interaction. We study the interrelation between visual and tactile information in a spatial learning task. We adopt a biologically inspired approach to detect multimodal correlations based on the properties of neurons in the superior colliculus. Reward-based Hebbian learning is applied to train an active gating network to weigh individual senses depending on the current environmental conditions. The model is implemented and tested on a mobile robot platform.

Thomas Strösslin, Christophe Krebser, Angelo Arleo, Wulfram Gerstner
A Neural Network Model Generating Invariance for Visual Distance

We present a neural network mechanism allowing for distance-invariant recognition of visual objects. The term distance-invariance refers to the toleration of changes in retinal image size that are due to varying view distances, as opposed to varying real-world object size. We propose a biologically plausible network model, based on the recently demonstrated spike-rate modulations of large numbers of neurons in striate and extra-striate visual cortex by viewing distance. In this context, we introduce the concept of distance complex cells. Our model demonstrates the capability of distance-invariant object recognition, and of resolving conflicts that other approaches to size-invariant recognition do not address.

Rüdiger Kupper, Reinhard Eckhorn
Modeling Neural Control of Locomotion: Integration of Reflex Circuits with CPG

A model of the spinal cord neural circuitry for control of cat hindlimb movements during locomotion was developed. The neural circuitry in the spinal cord was modeled as a network of interacting neuronal modules (NMs). All neurons were modeled in Hodgkin-Huxley style. Each NM included an α-motoneuron, Renshaw, Ia and Ib interneurons, and two interneurons associated with the central pattern generator (CPG). The CPG was integrated with reflex circuits. Each three-joint hindlimb was actuated by nine one- and two-joint muscles. Our simulation allowed us to find (and hence to suggest) an architecture of network connections within and between the NMs and a schematic of feedback connections to the spinal cord neural circuitry from muscles (Ia and Ib types) and touch sensors that provided a stable locomotion with different gaits, realistic patterns of muscle activation, and kinematics of limb movements.

Ilya A. Rybak, Dmitry G. Ivashko, Boris I. Prilutsky, M. Anthony Lewis, John K. Chapin
Comparing the Information Encoded by Different Brain Areas with Functional Imaging Techniques

We study the suitability of estimating the information conveyed by the responses of the populations of neurons in the brain by using the signal provided by imaging techniques like functional Magnetic Resonance Imaging (fMRI). The fMRI signal is likely to reflect a spatial averaging of the neuronal population activity. On the other hand, knowledge of the activity of each single neuron is needed in order to calculate the information. We explore this potential limitation by means of a simple computational model based on known tuning properties of individual neurons. We investigate the relationship between the information transmitted by the population, the signal change and the signal information as a function of the neuronal parameters. We find that the relationship is in general very different from linear. This result should be taken into account when comparing the information encoded by different brain areas with functional imaging techniques.

Angel Nevado, Malcolm P. Young, Stefano Panzeri
Mean-Field Population Dynamics of Spiking Neurons with Random Synaptic Delays

We derive a dynamical equation for the spike emission rate ν(t) of a homogeneous population of Integrate-and-Fire (IF) neurons, in an “extended” mean-field approximation (i.e., taking into account both the mean and the variance of the afferent current). Conditions for stability and characteristic times of the population transient response are investigated, and both are shown to be naturally expressed in terms of single neuron current-to-rate transfer function. Finite-size effects are incorporated by a stochastic extension of the mean-field equations and the associated Fokker-Planck formalism, and their implications for the frequency response of the population activity is illustrated through the power spectral density of ν(t). The role of synaptic delays in spike transmission is studied for an arbitrary distribution of delays.

Maurizio Mattia, Paolo Del Giudice
Stochastic Resonance and Finite Resolution in a Network of Leaky Integrate-and-Fire Neurons

This paper discusses the effect of stochastic resonance in a network of leaky integrate-and-fire (LIF) neurons and investigates its realisation on a Field Programmable Gate Array (FPGA). We report in this study that stochastic resonance which is mainly associated with floating point implementations is possible in both a single LIF neuron and a network of LIF neurons implemented on lower resolution integer based digital hardware. We also report that such a network can improve the signal-to-noise ratio (SNR) of the output over a single LIF neuron.

Nhamo Mtetwa, Leslie S. Smith, Amir Hussain
Reducing Communication for Distributed Learning in Neural Networks

A learning algorithm is presented for circuits consisting of a single layer of perceptrons. We refer to such circuits as parallel perceptrons. In spite of their simplicity, these circuits are universal approximators for arbitrary boolean and continuous functions. In contrast to backprop for multi-layer perceptrons, our new learning algorithm - the parallel delta rule (p-delta rule) - only has to tune a single layer of weights, and it does not require the computation and communication of analog values with high precision. Reduced communication also distinguishes our new learning rule from other learning rules for such circuits such as those traditionally used for MADALINE. A theoretical analysis shows that the p-delta rule does in fact implement gradient descent — with regard to a suitable error measure — although it does not require to compute derivatives. Furthermore it is shown through experiments on common real-world benchmark datasets that its performance is competitive with that of other learning approaches from neural networks and machine learning. Thus our algorithm also provides an interesting new hypothesis for the organization of learning in biological neural systems.

Peter Auer, Harald Burgsteiner, Wolfgang Maass
Flow Diagrams of the Quadratic Neural Network

The macroscopic dynamics of an extremely diluted three-state neural network based on mutual information and mean-field theory arguments is studied in order to establish the stability of the stationary states. Results are presented in terms of the pattern-recognition overlap, the neural activity, and the activity-overlap. It is shown that the presence of synaptic noise is essential for the stability of states that recognize only the active patterns when the full structure of the patterns is not recognizable. Basins of attraction of considerable size are obtained in all cases for a not too large storage ratio of patterns.

David R. C. Dominguez, E. Korutcheva, W. K. Theumann, R. Erichsen Jr.
Dynamics of a Plastic Cortical Network

The collective behavior of a network, modeling a cortical module, of spiking neurons connected by plastic synapses is studied. A detailed spike-driven synaptic dynamics is simulated in a large network of spiking neurons, implementing the full double dynamics of neurons and synapses. The repeated presentation of a set of external stimuli is shown to structure the network to the point of sustaining selective delay activity. When the synaptic dynamics is analyzed as a function of pre- and post-synaptic spike rates in functionally defined populations, it reveals a novel variation of the Hebbian plasticity paradigm: In any functional set of synapses between pairs of neurons - (stimulated-stimulated; stimulated-delay; stimulated-spontaneous etc...) there is a finite probability of potentiation as well as of depression. This leads to a saturation of potentiation or depression at the level of the ratio of the two probabilities, preventing the uncontrolled growth of the number of potentiated synapses. When one of the two probabilities is very high relative to the other, the familiar Hebbian mechanism is recovered.

Gianluigi Mongillo, Daniel J. Amit
Non-monotonic Current-to-Rate Response Function in a Novel Integrate-and-Fire Model Neuron

A novel integrate-and-fire model neuron is proposed to account for a non-monotonic f-I response function, as experimentally observed. As opposed to classical forms of adaptation, the present integrate-and-fire model the spike-emission process incorporates a state - dependent inactivation that makes the probability of emitting a spike decreasing as a function of the mean depolarization level instead of the mean firing rate.

Michele Giugliano, Giancarlo La Camera, Alexander Rauch, Hans-Rudolf Lüscher, Stefano Fusi
Small-World Effects in Lattice Stochastic Diffusion Search

Stochastic Diffusion Search is an efficient probabilistic bestfit search technique, capable of transformation invariant pattern matching. Although inherently parallel in operation it is difficult to implement efficiently in hardware as it requires full inter-agent connectivity. This paper describes a lattice implementation, which, while qualitatively retaining the properties of the original algorithm, restricts connectivity, enabling simpler implementation on parallel hardware. Diffusion times are examined for different network topologies, ranging from ordered lattices, over small-world networks to random graphs.

Kris De Meyer, J. Mark Bishop, Slawomir J. Nasuto
A Direction Sensitive Network Based on a Biophysical Neurone Model

To our understanding, modelling the dynamics of brain functions on cell level is essential to develop both a deeper understanding and classification of the experimental data as well as a guideline for further research. This paper now presents the implementation and training of a direction sensitive network on the basis of a biophisical neurone model including synaptic excitation, dendritic propagation and action-potential generation. The underlying model not only describes the functional aspects of neural signal processing, but also provides insight into their underlying energy consumption. Moreover, the training data set has been recorded by means of a real robotics system, thus bridging the gap to technical applications.

Burkhard Iske, Axel Löffler, Ulrich Rückert
Characterization of Triphasic Rhythms in Central Pattern Generators (I): Interspike Interval Analysis

Central Pattern generators (CPGs) neurons produce patterned signals to drive rhythmic behaviors in a robust and flexible manner. In this paper we use a well known CPG circuit and two different models of spiking-bursting neurons to analyze the presence of individual signatures in the behavior of the network. These signatures consist of characteristic interspike interval profiles in the activity of each cell. The signatures arise within the particular triphasic rhythm generated by the CPG network. We discuss the origin and role of this type of individuality observed in these circuits.

Roberto Latorre, Francisco B. Rodríguez, Pablo Varona
Characterization of Triphasic Rhythms in Central Pattern Generators (II): Burst Information Analysis

Central Pattern generators (CPGs) are neural circuits that produce patterned signals to drive rhythmic behaviors in a robust and flexible manner. In this paper we analyze the triphasic rhythm of a well known CPG circuit using two different models of spiking-bursting neurons and several network topologies. By means of a measure of mutual information we calculate the degree of information exchange in the bursting activity between neurons. We discuss the precision and robustness of different network configurations.

Francisco B. Rodríguez, Roberto Latorre, Pablo Varona
Neural Coding Analysis in Retinal Ganglion Cells Using Information Theory

Information Theory is used for analyzing the neural code of retinal ganglion cells. This approximation may quantify the amount of information transmitted by the whole population, versus single cells. The redundancy inherent in the code may be determined by obtaining the information bits of increasing cells datasets and by analyzing the relation between the joint information compared with the addition the information achieved by aisle cells. The results support the view that redundancy may play a crucial feature in visual information processing.

J. M. Ferrández, M. Bongard, F. García de Quirós, J. A. Bolea, E. Fernández
Firing Rate Adaptation without Losing Sensitivity to Input Fluctuations

Spike frequency adaptation is an important cellular mechanism by which neocortical neurons accommodate their responses to transient, as well as sustained, stimulations. This can be quantified by the slope reduction in the f- I curves due to adaptation. When the neuron is driven by a noisy, in vivo-like current, adaptation might also affect the sensitivity to the fluctuations of the input. We investigate how adaptation, due to calcium-dependent potassium current, affects the dynamics of the depolarization, as well as the stationary f- I curves of a white noise driven, integrate-and-fire model neuron. In addition to decreasing the slope of the f- I curves, adaptation of this type preserves the sensitivity of the neuron to the fluctuations of the input.

Giancarlo La Camera, Alexander Rauch, Walter Senn, Hans-R. Lüscher, Stefano Fusi
Does Morphology Influence Temporal Plasticity?

Applying bounded weight-independent temporal plasticity rule to synapses from independent Poisson firing presynaptic neurons onto a conductance-based integrate-and-fire neuron leads to a bimodal distribution of synaptic strength (Song et al., 2000). We extend this model to investigate the effects of spreading the synapses over the dendritic tree. The results suggest that distal synapses tend to lose out to proximal ones in the competition for synaptic strength. Against expectations, versions of the plasticity rule with a smoother transition between potentiation and depression make little difference to the distribution or lead to all synapses losing.

David C. Sterratt, Arjen van Ooyen
Attractor Neural Networks with Hypercolumns

We investigate attractor neural networks with a modular structure, where a local winner-takes-all rule acts within the modules (called hypercolumns). We make a signal-to-noise analysis of storage capacity and noise tolerance, and compare the results with those from simulations. Introducing local winner-takes-all dynamics improves storage capacity and noise tolerance, while the optimal size of the hypercolumns depends on network size and noise level.

Christopher Johansson, Anders Sandberg, Anders Lansner
Edge Detection and Motion Discrimination in the Cuneate Nucleus

In this paper we investigate how the cuneate nucleus could perform edge detection as well as motion discrimination by means of a single layer of multi-threshold cuneothalamic neurons. A well-known center-surround receptive field organization is in charge of edge detection, whereas single neuronal processing integrates inhibitory and excitatory inputs over time to discriminate dynamic stimuli. The simulations show how lateral inhibition determines a sensitized state in neighbouring neurons which respond to dynamic patterns with a burst of spikes.

Eduardo Sánchez, S. Barro, A. Canedo
Encoding the Temporal Statistics of Markovian Sequences of Stimuli in Recurrent Neuronal Networks

Encoding, storing, and recalling a temporal sequence of stimuli in a neuronal network can be achieved by creating associations between pairs of stimuli that are contiguous in time. This idea is illustrated by studying the behavior of a neural network model with binary neurons and binary stochastic synapses. The network extracts in an unsupervised manner the temporal statistics of the sequence of input stimuli. When a stimulus triggers the recalling process, the statistics of the output patterns reflects those of the input. If the sequence of stimuli is generated through a Markov process, then the network dynamics faithfully reproduces all the transition probabilities.

Alessandro Usseglio Viretta, Stefano Fusi, Shih-Chii Liu
Multi-stream Exploratory Projection Pursuit for the Formation of Complex Cells Similar to Visual Cortical Neurons

A multi-stream extension of Exploratory Projection Pursuit is proposed as a method for the formation of local, spatiotemporal, oriented filters similar to complex cells found in the visual cortex. This algorithm, which we call the Exploratory Correlation Analysis (ECA) network, is derived to maximise dependencies between separate, but related, data streams. By altering the functions on the outputs of the ECA network we can explore different forms of shared, higher order structure in multiple data streams.

Darryl Charles, Jos Koetsier, Donald MacDonald, Colin Fyfe
A Corticospinal Network for Control of Voluntary Movements of a Physiologically Based Experimental Platform

In this paper, we present a corticospinal network for control voluntary movements within constraints from neurophysiology. Neural controller is proposed to follow desired joint trajectories of a single link controlled by an agonist-ant agonist pair of actuator with muscle-like properties. This research work involves the design and implementation of an efficient biomechanical model of the animal muscular actuation system. In this biomechanical system the implementation of a mathematical model for whole skeletal muscle force generation on DC motors is carried out. Through experimental results, we showed that neural controller exhibits key kinematic properties of human movements, dynamics compensation and including asymmetric bell-shaped velocity profiles. Neural controller suggests how the brain may set automatic and volitional gating mechanisms to vary the balance of static and dynamic feedback information to guide the movement command and to compensate for external forces.

Francisco García-Córdova, Javier Molina-Vilaplana, Juan López-Coronado
Firing Rate for a Generic Integrate-and-Fire Neuron with Exponentially Correlated Input

The effect of time correlations in the afferent current on the firing rate of a generalized integrate-and-fire neuron model is studied. When the correlation time τc is small enough the firing rate can be calculated analytically for small values of the correlation amplitude α2. It is shown that the rate decreases as $$ \sqrt {\tau _c } $$ from its value at τc = 0. This limit behavior is universal for integrate-and-fire neurons driven by exponential correlated Gaussian input. The details of the model only determine the pre-factor multiplying $$ \sqrt {\tau _c } $$ . Two model examples are discussed.

Rubén Moreno, Néstor Parga
Iterative Population Decoding Based on Prior Beliefs

We propose a framework for investigation of the modulation of neural coding/decoding by the availability of prior information on the stimulus statistics. In particular, we describe a novel iterative decoding scheme for a population code that is based on prior information. It can be viewed as a generalization of the Richardson-Lucy algorithm to include degrees of belief that the encoding population encodes specific features. The method is applied to a signal detection task and it is verified that - in comparison to standard maximum-likelihood decoding - the procedure significantly enhances performance of an ideal observer if appropriate prior information is available. Moreover, the model predicts that high prior probabilities should lead to a selective sharpening of the tuning profiles of the corresponding recurrent weights similar to the shrinking of receptive fields under attentional demands that has been observed experimentally.

Jens R. Otterpohl, K. Pawelzik
When NMDA Receptor Conductances Increase Inter- spike Interval Variability

We analyze extensively the temporal properties of the train of spikes emitted by a simple model neuron as a function of the statistics of the synaptic input. In particular we focus on the asynchronous case, in which the synaptic inputs are random and uncorrelated. We show that the NMDA component acts as a non-stationary input that varies on longer time scales than the inter-spike intervals. In the sub-threshold regime, this can increase dramatically the coefficient of variability (bringing it beyond one). The analysis provides also simple guidelines for searching parameters that maximize irregularity.

Giancarlo La Camera, Stefano Fusi, Walter Senn, Alexander Rauch, Hans-R. Lüscher
Spike- Driven Synaptic Plasticity for Learning Correlated Patterns of Asynchronous Activity

Long term synaptic changes induced by neural spike activity are believed to underlie learning and memory. Spike-driven long term synaptic plasticity has been investigated in simplified situations in which the patterns of asynchronous activity to be encoded were statistically independent. An extra regulatory mechanism is required to extend the learning capability to more complex and natural stimuli. This mechanism is provided by the effects of the action potentials that are believed to be responsible for spike-timing dependent plasticity. These effects, when combined with the dependence of synaptic plasticity on the post-synaptic depolarization, produce the learning rule needed for storing correlated patterns of asynchronous neuronal activity.

Stefano Fusi
A Model of Human Cortical Microcircuits for the Study of the Development of Epilepsy

Brain lesions and structural abnormalities may lead to the development of epilepsy. However, it is not well known which specific alterations in cortical circuitry are necessary to create an epileptogenic region. In the present work we use computer simulations to test the hypothesis that the loss of chandelier cells, which are powerful inhibitory interneurons, might be a key element in the development of seizures in epileptic patients. We used circuit diagrams based on real data to model a 0.5 mm3 region of human neocortical tissue. We found that a slight decrease in the number of chandelier cells may cause epileptiform activity in the network. However, when this decrease affects other cell types, the global behaviour of the model is not qualitatively altered. Thus, our work supports the hypothesis that chandelier cells are fundamental in the development of epilepsy.

Manuel Sánchez-Montañés, Luis F. Lago-Fernández, Nazareth P. Castellanos, Ángel Merchán-Pérez, Jon I. Arellano, Javier DeFelipe
On the Computational Power of Neural Microcircuit Models: Pointers to the Literature

This paper provides references for my invited talk on the computational power of neural microcircuit models.

Wolfgang Maass

Connectionist Cognitive Science

Frontmatter
Networking with Cognitive Packets

This paper discusses a novel packet computer network architecture, a “Cognitive Packet Network (CPN)”, in which intelligent capabilities for routing and flow control are moved towards the packets, rather than being concentrated in the nodes. The routing algorithm in CPN uses reinforcement learning based on the Random Neural Network. We outline the design of CPN and show how it incorporates packet loss and delay directly into user Quality of Service (QoS) criteria, and use these criteria to conduct routing. We then present our experimental test-bed and report on extensive measurement experiments. These experiments include measurements of the network under link and node failures. They illustrate the manner in which neural network based CPN can be used to support a reliable adaptive network environment for peer-to-peer communications over an unreliable infrastructure.

Erol Gelenbe, Ricardo Lent, Zhiguang Xu
Episodic Memory: A Connectionist Interpretation

The strengths and weaknesses of episodic memory and its analysis when represented in symbolic terms are indicated. Some aspects may be more easily implemented in a connectionist form. Those which pose a challenge to a PDP system are presented in the context of recent neurobiological work. A theoretical, connectionist account of episodic memory is outlined. This assigns a critical role to hebbosomes in individual neurons as a means of establishing episodic memory records and relies on interaction between episodic memories in temporal and entorhinal cortices and the hippocampus as a determinant of modification of semantic memory.

J. G. Wallace, K. Bluff
Action Scheme Scheduling with a Neural Architecture: A Prefrontal Cortex Approach

This paper presents a computational model addressing behavioral learning and planning with a fully neural approach. The prefrontal functionality that is modeled is the ability to schedule elementary action schemes to reach behavioral goals. The use of robust context detection is discussed, as well as relations to biological views of the prefrontal cortex.

Hervé Frezza-Buet
Associative Arithmetic with Boltzmann Machines: The Role of Number Representations

This paper presents a study on associative mental arithmetic with mean-field Boltzmann Machines. We examined the role of number representations, showing theoretically and experimentally that cardinal number representations (e.g., numerosity) are superior to symbolic and ordinal representations w.r.t. learnability and cognitive plausibility. Only the network trained on numerosities exhibited the problem-size effect, the core phenomenon in human behavioral studies. These results urge a reevaluation of current cognitive models of mental arithmetic.

Ivilin Stoianov, Marco Zorzi, Suzanna Becker, Carlo Umilta
Learning the Long-Term Structure of the Blues

In general music composed by recurrent neural networks (RNNs) suffers from a lack of global structure. Though networks can learn note-by-note transition probabilities and even reproduce phrases, they have been unable to learn an entire musical form and use that knowledge to guide composition. In this study, we describe model details and present experimental results showing that LSTM successfully learns a form of blues music and is able to compose novel (and some listeners believe pleasing) melodies in that style. Remarkably, once the network has found the relevant structure it does not drift from it: LSTM is able to play the blues with good timing and proper structure as long as one is willing to listen.

Douglas Eck, Jürgen Schmidhuber
Recursive Neural Networks Applied to Discourse Representation Theory

Connectionist semantic modeling in natural language processing (a typical symbolic domain) is still a challenging problem. This paper introduces a novel technique, combining Discourse Representation Theory (DRT) with Recursive Neural Networks (RNN) in order to yield a neural model capable to discover properties and relationships among constituents of a knowledge-base expressed by natural language sentences. DRT transforms sequences of sentences into directed ordered acyclic graphs, while RNNs are trained to deal with such structured data. The acquired information allows the network to reply on questions, the answers of which are not directly expressed into the knowledge-base. A simple experimental demonstration, drawn from the context of a fairy tales is presented. Finally, on-going research direction are pointed-out.

Antonella Bua, Marco Gori, Fabrizio Santini
Recurrent Neural Learning for Helpdesk Call Routing

In the past, recurrent networks have been used mainly in neurocognitive or psycholinguistically oriented approaches of language processing. Here we examine recurrent neural networks for their potential in a difficult spoken language classification task. This paper describes an approach to learning classification of recorded operator assistance telephone utterances. We explore simple recurrent networks using a large, unique telecommunication corpus of spontaneous spoken language. Performance of the network indicates that a semantic SRN network is quite useful for learning classification of spontaneous spoken language in a robust manner, which may lead to their use in helpdesk call routing.

Sheila Garfield, Stefan Wermter
An Approach to Encode Multilayer Perceptrons

Genetic connectionism is based on the integration of evolution and neural network learning within one system. An overview of the Multilayer Perceptron encoding schemes is presented. A new approach is shown and tested on various case studies. The proposed genetic search not only optimizes the network topology but shortens the training time. There is no doubt that genetic algorithms can be used to solve efficiently the problem of network optimization considering not only static aspects of network architecture but also dynamic ones.

Jerzy Korczak, Emmanuel Blindauer
Dynamic Knowledge Representation in Connectionist Systems

One of the most pervading concepts underlying computational models of information processing in the brain is linear input integration of rate coded uni-variate information by neurons. After a suitable learning process this results in neuronal structures that statically represent knowledge as a vector of real valued synaptic weights. Although this general framework has contributed to the many successes of connectionism, in this paper we argue that for all but the most basic of cognitive processes, a more complex, multi-variate dynamic neural coding mechanism is required - knowledge should not be spacially bound to a particular neuron or group of neurons. We conclude the paper with discussion of a simple experiment that illustrates dynamic knowledge representation in a spiking neuron connectionist system.

J. Mark Bishop, Slawomir J. Nasuto, Kris De Meyer
Generative Capacities of Cellular Automata Codification for Evolution of NN Codification

Automatic methods for designing artificial neural nets are desired to avoid the laborious and erratically human expert’s job. Evolutionary computation has been used as a search technique to find appropriate NN architectures. Direct and indirect encoding methods are used to codify the net architecture into the chromosome. A reformulation of an indirect encoding method, based on two bi-dimensional cellular automata, and its generative capacity are presented.

Germán Gutiérrez, Inés M. Galván, José M. Molina, Araceli Sanchis

Data Analysis and Pattern Recognition

Frontmatter
Entropic Measures with Radial Basis Units

Two new entropic measures are proposed: the A-entropy and Eentropy, which are compared during competitive training processes in multiplayer networks with radial basis units. The behavior of these entropies are good indicators of the orthogonality reached in the layer representations for vector quantization tasks. The proposed E-entropy is a good candidate to be considered as a measure of the training level reached for all layers in the same training process. Both measures would serve to monitorize the competitive learning in this kind of neural model, that is usually implemented in the hidden layers of the Radial Basis Functions networks.

J. David Buldain
New Methods for Splice Site Recognition

Splice sites are locations in DNA which separate protein-coding regions (exons) from noncoding regions (introns). Accurate splice site detectors thus form important components of computational gene finders. We pose splice site recognition as a classification problem with the classifier learnt from a labeled data set consisting of only local information around the potential splice site. Note that finding the correct position of splice sites without using global information is a rather hard task. We analyze the genomes of the nematode Caenorhabditis elegans and of humans using specially designed support vector kernels. One of the kernels is adapted from our previous work on detecting translation initiation sites in vertebrates and another uses an extension to the well-known Fisher-kernel. We find excellent performance on both data sets.

Sören Sonnenburg, Gunnar Rätsch, Arun Jagota, Klaus-Robert Müller
A Weak Condition on Linear Independence of Unscaled Shifts of a Function and Finite Mappings by Neural Networks

Let 1 ≤c ≤ d and let g be a slowly increasing function defined on Rsuc. Suppose that the support of the Fourier transform $$ \mathcal{F}_c g $$ of g includes a converging sequence of distinct points yk which sufficiently rapidly come close to a line as k → ∞. Then, any mapping of any number, say n, of any points x1,..., x n in Rd onto R can be implemented by a linear sum of the form $$ \Sigma _{j = 1}^n a_j g\left( {Wx_i + z_j } \right) $$ . Here, W is a d x c matrix having orthonormal row vectors, implying that g is used without scaling, and that the sigmoid function defined on R and the radial basis function defined on Rd are treated on a common basis.

Yoshifusa Ito
Identification of Wiener Model Using Radial Basis Functions Neural Networks

A new method is introduced for the identification of Wiener model. The Wiener model consists of a linear dynamic block followed by a static nonlinearity. The nonlinearity and the linear dynamic part in the model are identified by using radial basis functions neural network (RBFNN) and autoregressive moving average (ARMA) model, respectively. The new algorithm makes use of the well known mapping ability of RBFNN. The learning algorithm based on least mean squares (LMS) principle is derived for the training of the identification scheme. The proposed algorithm estimates the weights of the RBFNN and the coefficients of ARMA model simultaneously.

Ali Syed Saad Azhar, Hussain N. Al-Duwaish
A New Learning Algorithm for Mean Field Boltzmann Machines

We present a new learning algorithm for Mean Field Boltzmann Machines based on the contrastive divergence optimization criterion. In addition to minimizing the divergence between the data distribution and the equilibrium distribution, we maximize the divergence between one-step reconstructions of the data and the equilibrium distribution. This eliminates the need to estimate equilibrium statistics, so we do not need to approximate the multimodal probability distribution of the free network with the unimodal mean field distribution. We test the learning algorithm on the classification of digits.

Max Welling, Geoffrey E. Hinton
A Continuous Restricted Boltzmann Machine with a Hardware- Amenable Learning Algorithm

This paper proposes a continuous stochastic generative model that offers an improved ability to model analogue data, with a simple and reliable learning algorithm. The architecture forms a continuous restricted Boltzmann Machine, with a novel learning algorithm. The capabilities of the model are demonstrated with both artificial and real data.

Hsin Chen, Alan Murray
Human Recognition by Gait Analysis Using Neural Networks

This paper presents a new method to recognize people by their gait, which forms part of a major project to detect and recognize automatically different human behaviours. The project is comprised of several stages, but this paper is focused on the last one, i.e. recognition. This stage is based on the Self-Organizing Map, assuming that the previous stage of feature extraction has already been solved. Although this previous stage is solved with manual extraction of human model points, the obtained results demonstrate the viability of the neural approach to the recognition of these kind of temporal sequences.

J. Elías Herrero-Jaraba, Carlos Orrite-Uruñuela, David Buldain, Armando Roy-Yarza
Learning Vector Quantization for Multimodal Data

Learning vector quantization (LVQ) as proposed by Kohonen is a simple and intuitive, though very successful prototype-based clustering algorithm. Generalized relevance LVQ (GRLVQ) constitutes a modification which obeys the dynamics of a gradient descent and allows an adaptive metric utilizing relevance factors for the input dimensions. As iterative algorithms with local learning rules, LVQ and modifications crucially depend on the initialization of the prototypes. They often fail for multimodal data. We propose a variant of GRLVQ which introduces ideas of the neural gas algorithm incorporating a global neighborhood coordination of the prototypes. The resulting learning algorithm, supervised relevance neural gas, is capable of learning highly multimodal data, whereby it shares the benefits of a gradient dynamics and an adaptive metric with GRLVQ.

B. Hammer, M. Strickert, T. Villmann
Learning the Dynamic Neural Networks with the Improvement of Generalization Capabilities

This work addresses the problem of improving the generalization capabilities of continuous recurrent neural networks. The learning task is transformed into an optimal control framework in which the weights and the initial network state are treated as unknown controls. A new learning algorithm based on a variational formulation of Pontryagin’s maximum principle is proposed. Numerical examples are also given which demonstrate an essential improvement of generalization capabilities after the learning process of a recurrent network.

Mirosław Galicki, Lutz Leistritz, Herbert Witte
Model Clustering for Neural Network Ensembles

We show that large ensembles of (neural network) models, obtained e.g. in bootstrapping or sampling from (Bayesian) probability distributions, can be effectively summarized by a relatively small number of representative models. We present a method to find representative models through clustering based on the models’ outputs on a data set. We apply the method on models obtained through bootstrapping (Boston housing) and on a multitask learning example.

Bart Bakker, Tom Heskes
Does Crossover Probability Depend on Fitness and Hamming Differences in Genetic Algorithms?

The goal of this paper is to study if there is a dependency between the probability of crossover with the genetic similarity (in terms of hamming distance) and the fitness difference between two individuals. In order to see the relation between these parameters, we will find a neural network that simulates the behavior of the probability of crossover with these differences as inputs. An evolutionary algorithm will be used, the goodness of every network being determined by a genetic algorithm that optimizes a well-known function.

José Luis Fernández-Villacañas Martín, Mónica Sierra Sánchez
Extraction of Fuzzy Rules Using Sensibility Analysis in a Neural Network

This paper proposes a new method for the extraction of knowledge from a trained type feed-forward neural network. The new knowledge extracted is expressed by fuzzy rules directly from a sensibility analysis between the inputs and outputs of the relationship that model the neural network. This easy method of extraction is based on the similarity of a fuzzy set with the derivative of the tangent hyperbolic function used as an activation function in the hidden layer of the neural network. The analysis performed is very useful, not only for the extraction of knowledge, but also to know the importance of every rule extracted in the whole knowledge and, furthermore, the importance of every input stimulating the neural network.

Jesús Manuel Besada-Juez, Miguel A. Sanz-Bobi
A Simulated Annealing and Resampling Method for Training Perceptrons to Classify Gene-Expression Data

We investigate the use of perceptrons for classification of microarray data. Small round blue cell tumours of childhood are difficult to classify both clinically and via routine histology. Khan et al. [10] showed that a system of artificial neural networks can utilize gene expression measurements from microarrays and classify these tumours into four different categories. We used a simulated annealing-based method in learning a system of perceptrons, each obtained by resampling of the training set. Our results are comparable to those of Khan et al., indicating that there is a role for perceptrons in the classification of tumours based on gene expression data. We also show that it is critical to perform feature selection in this type of models.

Andreas A. Albrecht, Staal A. Vinterbo, C. K. Wong, Lucila Ohno-Machado
Neural Minimax Classifiers

Many supervised learning algorithms are based on the assumption that the training data set reflects the underlying statistical model of the real data. However, this stationarity assumption may be partially violated in practice: for instance, if the cost of collecting data is class dependent, the class priors of the training data set may be different from that of the test set. A robust solution to this problem is selecting the classifier that minimize the error probability under the worst case conditions. This is known as the minimax strategy. In this paper we propose a mechanism to train a neural network in order to estimate the minimax classifier that is robust to changes in the class priors. This procedure is illustrated on a softmax-based neural network, although it can be applied to other structures. Several experimental results show the advantages of the proposed methods with respect to other approaches.

Rocío Alaiz-Rodríguez, Jesús Cid-Sueiro
Sampling Parameters to Estimate a Mixture Distribution with Unknown Size

We present a new view on the problem of parameter estimation in the field of Gaussian Mixture Models. Our approach is based on the idea of stable estimates and is realized by an algorithm which samples appropriate parameters. It further determines the adequate complexity of the model without extra effort. We show that the sampling approach avoids overfitting and that it outperforms maximum likelihood estimates in practical tasks.

Martin Lauer
Selecting Neural Networks for Making a Committee Decision

To improve recognition results, decisions of multiple neural networks can be aggregated into a committee decision. In contrast to the ordinary approach of utilizing all neural networks available to make a committee decision, we propose creating adaptive committees, which are specific for each input data point. A prediction network is used to identify classification neural networks to be fused for making a committee decision about a given input data point. The jth output value of the prediction network expresses the expectation level that the jth classification neural network will make a correct decision about the class label of a given input data point. The effectiveness of the approach is demonstrated on two artificial and three real data sets.

Antanas Verikas, Arunas Lipnickas, Kerstin Malmqvist
High-Accuracy Mixed-Signal VLSI for Weight Modification in Contrastive Divergence Learning

This paper presents an approach to on-chip, unsupervised learning. A circuit capable of changing a neuron’s synaptic weight with great accuracy is described and experimental results from its aVLSI implementation in a 0.6µm CMOS process are shown and discussed. We consider its use in the “contrastive divergence” learning scheme of the Product of Experts (PoE) architecture.

Patrice Fleury, Alan F. Murray, Martin Reekie
Data Driven Generation of Interactions for Feature Binding and Relaxation Labeling

We present a combination of unsupervised and supervised learning to generate a compatibility interaction for feature binding and labeling problems. We focus on the unsupervised data driven generation of prototypic basis interactions by means of clustering of proximity vectors, which are computed from pairs of data in the training set. Subsequently a supervised method recently introduced in [9] is used to determine coefficients to form a linear combination of the basis functions, which then serves as interaction. As special labeling dynamic we use the competitive layer model, a recurrent neural network with linear threshold neurons, and show an application to cell segmentation.

Sebastian Weng, Jochen J. Steil
A Hybrid Two-Stage Fuzzy ARTMAP and LVQ Neuro-fuzzy System for Online Handwriting Recognition

This paper presents a two-stage handwriting recognizer for classification of isolated characters that exploits explicit knowledge on characters’ shapes and execution plans. The first stage performs prototype extraction of the training data using a Fuzzy ARTMAP based method. These prototypes are able to improve the performance of the second stage consisting of LVQ codebooks by means of providing the aforementioned explicit knowledge on shapes and execution plans. The proposed recognizer has been tested on the UNIPEN international database achieving an average recognition rate of 90.15%, comparable to that reached by humans and other recognizers found in literature.

Miguel L. Bote-Lorenzo, Yannis A. Dimitriadis, Eduardo Gómez-Sánchez
A New Learning Method for Piecewise Linear Regression

A new connectionist model for the solution of piecewise linear regression problems is introduced; it is able to reconstruct both continuous and non continuous real valued mappings starting from a finite set of possibly noisy samples. The approximating function can assume a different linear behavior in each region of an unknown polyhedral partition of the input domain.The proposed learning technique combines local estimation, clustering in weight space, multicategory classification and linear regression in order to achieve the desired result. Through this approach piecewise affine solutions for general nonlinear regression problems can also be found.

Giancarlo Ferrari-Trecate, Marco Muselli
Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

We consider the problem of developing rapid, stable, and scalable stochastic gradient descent algorithms for optimisation of very large nonlinear systems. Based on earlier work by Orr et al. on adaptive momentum—an efficient yet extremely unstable stochastic gradient descent algorithm—we develop a stabilised adaptive momentum algorithm that is suitable for noisy nonlinear optimisation problems. The stability is improved by introducing a forgetting factor 0 ≤ λ ≤ 1 that smoothes the trajectory and enables adaptation in non-stationary environments. The scalability of the new algorithm follows from the fact that at each iteration the multiplication by the curvature matrix can be achieved in O (n) steps using automatic differentiation tools. We illustrate the behaviour of the new algorithm on two examples: a linear neuron with squared loss and highly correlated inputs, and a multilayer perceptron applied to the four regions benchmark task.

Thore Graepel, Nicol N. Schraudolph
Potential Energy and Particle Interaction Approach for Learning in Adaptive Systems

Adaptive systems research is mainly concentrated around optimizing cost functions suitable to problems. Recently, Principe et al. proposed a particle interaction model for information theoretical learning. In this paper, inspired by this idea, we propose a generalization to the particle interaction model for learning and system adaptation. In addition, for the special case of supervised multi-layer perceptron (MLP) training we propose the interaction force backpropagation algorithm, which is a generalization of the standard error backpropagation algorithm for MLPs.

Deniz Erdogmus, Jose C. Principe, Luis Vielva, David Luengo
Piecewise-Linear Approximation of Any Smooth Output Function on the Cellular Neural Network

The Cellular Neural Network Universal Machine (CNNUM)[7] is a novel hardware architecture which make use of complex spatio-temporal dynamics performed in the Cellular Neural Network (CNN)[1] for solving real-time image processing tasks. Actual VLSI chip prototypes [6] have the limitation of performing a fixed piecewise-linear (PWL) saturation output function. In this work, a novel algorithm for emulating a piecewise-linear (PWL) approximation of any nonlinear output function on the CNNUM VLSI chip is presented.

Víctor M. Preciado
MDL Based Model Selection for Relevance Vector Regression

Relevance Vector regression is a form of Support Vector regression, recently proposed by M.E.Tipping, which allows a sparse representation of the data. The Bayesian learning algorithm proposed by the author leaves the partially open question of how to automatically choose the optimal model.In this paper we describe a model selection criterion inspired by the Minimum Description Length (MDL) principle. We show that our proposal is effective in finding the optimal kernel parameter both on an artificial dataset and a real-world application.

Davide Anguita, Matteo Gagliolo
On the Training of a Kolmogorov Network

The Kolmogorov theorem gives that the representation of continuous and bounded real-valued functions of n variables by the superposition of functions of one variable and addition is always possible. Based on the fact that each proof of the Kolmogorov theorem or its variants was a constructive one so far, there is the principal possibility to attain such a representation. This paper reviews a procedure for obtaining the Kolmogorov representation of a function, based on an approach given by David Sprecher. The construction is considered in more detail for an image function.

Mario Köppen
A New Method of Feature Extraction and Its Stability

In the classification on a high dimensional feature space such as in face recognition problems, feature extraction techniques are usually used to overcome the so called ’curse of dimensionality.’ In this paper, we propose a new feature extraction method for classification problem based on the conventional independent component analysis. The local stability of the proposed method is also dealt with. The proposed algorithm makes use of the binary class labels to produce two sets of new features; one that does not carry information about the class label — these features will be discarded — and the other that does. The advantage is that general ICA algorithms become available to a task of feature extraction by maximizing the joint mutual information between class labels and new features, although only for two-class problems. Using the new features, we can greatly reduce the dimension of feature space without degrading the performance of classifying systems.

Nojun Kwak, Chong-Ho Choi
Visualization and Analysis of Web Navigation Data

In this paper, we present two new approaches for the analysis of web site users behaviors. The first one is a synthetic visualization of Log file data and the second one is a coding of sequence based data. This coding allows us to carry out a vector quantization, and thus to find meaningful prototypes of the data set. For this, first the set of sessions is partitioned and then a prototype is extracted from each of the resulting classes. This analytic process allows us to categorize the different web site users behaviors interested by a set of categories of pages in a commercial site.

Khalid Benabdeslem, Younes Bennani, Eric Janvier
Missing Value Estimation Using Mixture of PCAs

We apply mixture of principal component analyzers (MPCA) to missing value estimation problems. A variational Bayes (VB) method for MPCA with missing values is developed. The missing values are regarded as hidden variables and their estimation is done simultaneously with the parameter estimation. It is found that VB method is better than maximum likelihood method by using artificial data. We also applied our method to DNA microarray data and the performance outweighed the conventional k-nearest neighbor method.

Shigeyuki Oba, Masa-aki Sato, Ichiro Takemasa, Morito Monden, Ken-ichi Matsubara, Shin Ishii
High Precision Measurement of Fuel Density Profiles in Nuclear Fusion Plasmas

This paper presents a method for deducing fuel density profiles of nuclear fusion plasmas in realtime during an experiment. A Multi Layer Perceptron (MLP) neural network is used to create a mapping between plasma radiation spectra and indirectly deduced hydrogen isotope densities. By combining different measurements a cross section of the density is obtained. For this problem, precision can be optimised by exploring the fact that both the input errors and target errors are known a priori. We show that a small adjustment of the backpropagation algorithm can take this into account during training. For subsequent predictions by the trained model, Bayesian posterior intervals will be derived, reflecting the known errors on inputs and targets both from the training set and current input pattern. The model is shown to give reliable estimates of the full fuel density profile in realtime, and could therefore be utilised for realtime feedback control of the fusion plasma.

Jakob Svensson, Manfred von Hellermann, Ralf König
Heterogeneous Forests of Decision Trees

In many cases it is better to extract a set of decision trees and a set of possible logical data descriptions instead of a single model. The trees that include premises with constraints on the distances from some reference points are more flexible because they provide nonlinear decision borders. Methods for creating heterogeneous forests of decision trees based on Separability of Split Value (SSV) criterion are presented. The results confirm their usefulness in understanding data structures.

Krzysztof Grąbczewski, Włodzisław Duch
Independent Component Analysis for Domain Independent Watermarking

A new principled domain independent watermarking framework is presented. The new approach is based on embedding the message in statistically independent sources of the covertext to mimimise cover-text distortion, maximise the information embedding rate and improve the method’s robustness against various attacks. Experiments comparing the performance of the new approach, on several standard attacks show the current proposed approach to be competitive with other state of the art domain-specific methods.

Stéphane Bounkong, David Saad, David Lowe
Applying Machine Learning to Solve an Estimation Problem in Software Inspections

We use Bayesian neural network techniques to estimate the number of defects in a software document based on the outcome of an inspection of the document. Our neural networks clearly outperform standard methods from software engineering for estimating the defect content. We also show that selecting the right subset of features largely improves the predictive performance of the networks.

Thomas Ragg, Frank Padberg, Ralf Schoknecht
Clustering of Gene Expression Data by Mixture of PCA Models

Clustering techniques, such as hierarchical clustering, k- means algorithm and self-organizing maps, are widely used to analyze gene expression data. Results of these algorithms depend on several parameters, e.g., the number of clusters. However, there is no theoretical criterion to determine such parameters. In order to overcome this problem, we propose a method using mixture of PCA models trained by a variational Bayes (VB) estimation. In our method, good clustering results are selected based on the free energy obtained within the VB estimation. Furthermore, by taking an ensemble of estimation results, a robust clustering is achieved without any biological knowledge. Our method is applied to a clustering problem for gene expression data during a sporulation of Bacillus subtilis and it is able to capture characteristics of the sigma cascade.

Taku Yoshioka, Ryouko Morioka, Kazuo Kobayashi, Shigeyuki Oba, Naotake Ogawsawara, Shin Ishii
Selecting Ridge Parameters in Infinite Dimensional Hypothesis Spaces

Previously, an unbiased estimator of the generalization error called the subspace information criterion (SIC) was proposed for a finite dimensional reproducing kernel Hilbert space (RKHS). In this paper, we extend SIC so that it can be applied to any RKHSs including infinite dimensional ones. Computer simulations show that the extended SIC works well in ridge parameter selection.

Masashi Sugiyama, Klaus-Robert Müller
A New Sequential Algorithm for Regression Problems by Using Mixture Distribution

A new sequential method for the regression problems is studied. The suggested method is motivated by boosting methods in the classification problems. Boosting algorithms use the weighted data to update the estimator. In this paper we construct a sequential estimation method from the viewpoint of nonparametric estimation by using mixture distribution. The algorithm uses the weighted residuals of training data. We compare the suggested algorithm to the greedy algorithm by the simple simulation.

Takafumi Kanamori
Neural-Based Classification of Blocks from Documents

The process of obtaining the logical structure of a given document from its geometric structure is known as document understanding. In this process, it is important to classify the distinct blocks or homogeneous regions that the document contains as reliably as possible. In this work, we propose a neural-based method to classify among manuscript text, typed text, drawings and photographic images. The excellent performance of the approach is demonstrated by the experiments performed.

Damián López, María José Castro
Feature Selection via Genetic Optimization

In this paper we present a novel Genetic Algorithm (GA) for feature selection in machine learning problems. We introduce a novel genetic operator which fixes the number of selected features. This operator, we will refer to it as m-features operator, reduces the size of the search space and improves the GA performance and convergence. Simulations on synthetic and real problems have shown very good performance of the m-features operator, improving the performance of other existing approaches over the feature selection problem.

Sancho Salcedo-Sanz, Mario Prado-Cumplido, Fernando Pérez-Cruz, Carlos Bousoño-Calzón
Neural Networks, Clustering Techniques, and Function Approximation Problems

To date, clustering techniques have always been oriented to solve classification and pattern recognition problems. However, some authors have applied them unchanged to construct initial models for function approximators. Nevertheless, classification and function approximation problems present quite different objectives. Therefore it is necessary to design new clustering algorithms specialized in the problem of function approximation.

Jesús González, Ignacio Rojas, Héctor Pomares
Evolutionary Training of Neuro-fuzzy Patches for Function Approximation

This paper describes how the fundamental principles of GAs can be hybridized with classical optimization techniques for the design of an evolutive algorithm for neuro-fuzzy systems. The proposed algorithm preserves the robustness and global search capabilities of GAs and improves on their performance, adding new capabilities to fine-tune the solutions obtained.

Jésus González, Ignacio Rojas, Hector Pomares, Alberto Prieto, K. Goser
Using Recurrent Neural Networks for Automatic Chromosome Classification

Partial recurrent connectionist models can be used for classification of objects of variable length. In this work, an Elman network has been used for chromosome classification. Experiments were carried out using the Copenhagen data set. Local features over normal slides to the axis of the chromosomes were calculated, which produced a type of time-varying input pattern. Results showed an overall error rate of 5.7%, which is a good perfomance in a task which does not take into account cell context (isolated chromosome classification).

César Martínez, Alfons Juan, Francisco Casacuberta
A Mixed Ensemble Approach for the Semi-supervised Problem

In this paper we introduce a mixed approach for the semi-supervised data problem. Our approach consists of an ensemble unsupervised learning part where the labeled and unlabeled points are segmented into clusters. Continuing, we take advantage of the a priori information of the labeled points to assign classes to clusters and proceed to predicting with the ensemble method new incoming ones. Thus, we can finally conclude classifying new data points according to the segmentation of the whole set and the association of its clusters to the classes.

Evgenia Dimitriadou, Andreas Weingessel, Kurt Hornik
Using Perceptrons for Supervised Classification of DNA Microarray Samples: Obtaining the Optimal Level of Information and Finding Differentially Expressed Genes

The success of the application of neural networks to DNA microarray data comes from their efficiency in dealing with noisy data. Here we describe a combined approach that provides, at the same time, an accurate classification of samples in DNA microarray gene expression experiments (different cancer cell lines, in this case) and allows the extraction of the gene, or clusters of co-expressing genes, that account for these differences. Firstly we reduce the dataset of gene expression profiles to a number of non-redundant clusters of co-expressing genes. Then, the cluster’s average values are used for training a perceptron, that produces an accurate classification of different classes of cell lines. The weights that connect the gene clusters to the cell lines are used to asses the relative importance of the genes in the definition of these classes. Finally, the biological role for these groups of genes is discussed.

Alvaro Mateos, Javier Herrero, Joaquín Dopazo
Lower Bounds for Training and Leave-One-Out Estimates of the Generalization Error

In this paper, we compare two well-known estimates of the generalization error: the training error and the leave-one-out error. We focuse our work on lower bounds on the performance of these estimates. Contrary to the common intuition, we show that in the worst case the leave-one-out estimate is worse than the training error.

Gerald Gavin, Olivier Teytaud
SSA, SVD, QR-cp, and RBF Model Reduction

We propose an application of SVD model reduction to the class of RBF neural models for improving performance in contexts such as on-line prediction of time series. The SVD is coupled with QR-cp factorization. It has been found that such a coupling leads to more precise extraction of the relevant information, even when using it in an heuristic way. Singular Spectrum Analysis (SSA) and its relation to our method is also mentioned. We analize performance of the proposed on-line algorithm using a ‘benchmark’ chaotic time series and a difficult-to-predict, dynamically changing series.

Moisés Salmerón, Julio Ortega, Carlos García Puntonet, Alberto Prieto, Ignacio Rojas
Linkage Analysis: A Bayesian Approach

In this article we propose a method for linkage analysis that is based on Bayesian statistics. It is non-parametric in the sense that there is no need to specify disease parameters such as penetrance values. We show that the method has significantly more statistical power than existing methods on artificially created databases. Finally, the possibility to extend the method to multi-locus diseases is discussed.

Martijn A. R. Leisink, Hilbert J. Kappen, Han G. Brunner
On Linear Separability of Sequences and Structures

Linear separability of sequences and structured data is studied. On the basis of a theoretical model, necessary and sufficient conditions for nonlinear separability are derived by a well known result for vectors. Examples of sufficient conditions for linear separability of both sequences and structured data are given.

Alessandro Sperduti
Stability-Based Model Order Selection in Clustering with Applications to Gene Expression Data

The concept of cluster stability is introduced to assess the validity of data partitionings found by clustering algorithms. It allows us to explicitly quantify the quality of a clustering solution, without being dependent on external information. The principle of maximizing the cluster stability can be interpreted as choosing the most self-consistent data partitioning. We present an empirical estimator for the theoretically derived stability index, based on resampling. Experiments are conducted on well known gene expression data sets, re-analyzing the work by Alon et al. [1] and by Spellman et al. [8].

Volker Roth, Mikio L. Braun, Tilman Lange, Joachim M. Buhmann
EM-Based Radial Basis Function Training with Partial Information

This work presents an EM approach for nonlinear regression with incomplete data. Radial Basis Function (RBF) Neural Networks are employed since their architecture is appropriate for an efficient parameter estimation. The training algorithm expectation (E) step takes into account the censorship over the data, and the maximization (M) step can be implemented in several ways. The results guarantee the convergence of the algorithm in the GEM (Generalized EM) framework.

Pedro J. Zufiria, Carlos Rivero
Stochastic Supervised Learning Algorithms with Local and Adaptive Learning Rate for Recognising Hand-Written Characters

Supervised learning algorithms (i.e. Back Propagation algorithms, BP) are reliable and widely adopted for real world applications. Among supervised algorithms, stochastic ones (e.g. Weight Perturbation algorithms, WP) exhibit analog VLSI hardware friendly features. Though, they have not been validated on meaningful applications. This paper presents the results of a thorough experimental validation of the parallel WP learning algorithm on the recognition of handwritten characters. We adopted a local and adaptive learning rate management to increase the efficiency. Our results demonstrate that the performance of the WP algorithm are comparable to the BP ones except that the network complexity (i.e. the number of hidden neurons) is fairly lower. The average number of iterations to reach convergence is higher than in the BP case, but this cannot be considered a heavy drawback in view of the analog parallel on-chip implementation of the learning algorithm.

Matteo Giudici, Filippo Queirolo, Maurizio Valle
Input and Output Feature Selection

Feature selection is called wrapper whenever the classification algorithm is used in the selection procedure. Our approach makes use of linear classifiers wrapped into a genetic algorithm. As a proof of concept we check its performance against the UCI spam filtering problem showing that the wrapping of linear neural networks is the best. However, making sense of data involves not only selecting input features but also output features. Generally, this is considered too much of a human task to be addressed by computers. Only a few algorithms, such as association rules, allow the output to change. One of the advantages of our approach is that it can be easily generalized to search for outputs and relevant inputs at the same time. This is addressed at the end of the paper and it is currently being investigated.

Alejandro Sierra, Fernando Corbacho
Optimal Extraction of Hidden Causes

This paper presents a new framework extending previous work on multiple cause mixture models. We search for an optimal neural network codification of a given set of input patterns, which implies hidden cause extraction and redundancy elimination leading to a factorial code. We propose a new entropy measure whose maximization leads to both maximum information transmission and independence of internal representations for factorial input spaces in the absence of noise. No extra assumptions are needed, in contrast with previous models in which some information about the input space, such as the number of generators, must be known a priori.

Luis F. Lago-Fernández, Fernando Corbacho
Towards a New Information Processing Measure for Neural Computation

The understanding of the relation between structure and function in the brain requires theoretical frameworks capable of dealing with a large variety of complex experimental data. Likewise neural computation strives to design structures from which complex functionality should emerge. The framework of information theory has been partially successful in explaining certain brain structures with respect to sensory transformations under restricted conditions. Yet classical measures of information have not taken an explicit account of some of the fundamental concepts in brain theory and neural computation: namely that optimal coding depends on the specific task(s) to be solved by the system, and that autonomy and goal orientedness also depend on extracting relevant information from the environment and specific knowledge from the receiver to be able to affect it in the desired way. This paper presents a general (i.e. implementation independent) new information processing measure that takes into account the previously mentioned issues. It is based on measuring the transformations required to go from the original alphabet in which the sensory messages are represented, to the objective alphabet which depends on the implicit task(s) imposed by the environment-system relation.

Manuel A. Sánchez-Montañés, Fernando J. Corbacho
A Scalable and Efficient Probabilistic Information Retrieval and Text Mining System

A system for probabilistic information retrieval and text mining that is both scalable and efficient is presented. Separate feature extraction or stop-word lists are not needed since the system can remove unneeded parameters dynamically based on a local mutual information measure. This is shown to be as effective as using a global measure. A novel way of storing system parameters eliminates the need for a ranking step during information retrieval from queries. Probability models over word contexts provide a method to suggest related words that can be added to a query. Test results are presented on a categorization task and screen shots from a live system are shown to demonstrate its capabilities.

Magnus Stensmo
Maximum and Minimum Likelihood Hebbian Learning for Exploratory Projection Pursuit

This paper presents an extension to the learning rules of the Principal Component Analysis Network which has been derived to be optimal for a specific probability density function. We note this pdf is one of a family of pdfs and investigate the learning rules formed in order to be optimal for several members of this family. We show that the whole family of these learning rules can be viewed as methods for performing Exploratory Projection Pursuit. We show that these methods provide a simple robust method for the identification of structure in remote sensing images.

Donald MacDonald, Emilio Corchado, Colin Fyfe, Erzsebet Merenyi
Learning Context Sensitive Languages with LSTM Trained with Kalman Filters

Unlike traditional recurrent neural networks, the Long Short-Term Memory (LSTM) model generalizes well when presented with training sequences derived from regular and also simple nonregular languages. Our novel combination of LSTM and the decoupled extended Kalman filter, however, learns even faster and generalizes even better, requiring only the 10 shortest exemplars (n ≤ 10) of the context sensitive language anbncn to deal correctly with values of n up to 1000 and more. Even when we consider the relatively high update complexity per timestep, in many cases the hybrid offers faster learning than LSTM by itself.

Felix A. Gers, Juan Antonio Pérez-Ortiz, Douglas Eck, Jürgen Schmidhuber
Hierarchical Model Selection for NGnet Based on Variational Bayes Inference

This article presents a variational Bayes inference for normalized Gaussian network, which is a kind of mixture models of local experts. In order to search for the optimal model structure, we develop a hierarchical model selection method. The performance of our method is evaluated by using function approximation and nonlinear dynamical system identification problems. Our method achieved better performance than existing methods.

Junichiro Yoshimoto, Shin Ishii, Masa-aki Sato
Multi-layer Perceptrons for Functional Data Analysis: A Projection Based Approach

In this paper, we propose a new way to use Functional Multi-Layer Perceptrons (FMLP). In our previous work, we introduced a natural extension of Multi Layer Perceptrons (MLP) to functional inputs based on direct manipulation of input functions. We propose here to rely on a representation of input and weight functions thanks to projection on a truncated base. We show that the proposed model has the universal approximation property. Moreover, parameter estimation for this model is consistent. The new model is compared to the previous one on simulated data: performances are comparable but training time it greatly reduced.

Brieuc Conan-Guez, Fabrice Rossi
Natural Gradient and Multiclass NLDA Networks

Natural gradient has been recently introduced as a method to improve the convergence of Multilayer Perceptron (MLP) training [1] as well as that of other neural network type algorithms. The key idea is to recast the training process as a problem in quasi maximum log—likelihood estimation of a certain semipara-metric probabilistic model. This allows the natural introduction of a riemannian metric tensor G in the probabilistic model space. Once G is computed, the “natural” gradient in this setting is $$c G\left( W \right)^{ - 1} \nabla _W e\left( {X,y:W} \right) $$ , rather than the ordinary euclidean gradient $$ \nabla _W e\left( {X,y;W} \right) $$ . Here e(X,y; W) denotes an error function associated to a concrete pattern (X, y) and weight set W. For instance, in MLP training, e(X,y;W) = (y – F(X,W))2/2, with F the MLP transfer function. Viewing (y – F(X, W))2/2 as the log—likelihood of a probability density, the metric tensor is $$ G\left( W \right) = \smallint \smallint \frac{{\partial \log p}} {{\partial W}}\left( {\frac{{\partial \log p}} {{\partial W}}} \right)^t p\left( {X,y;W} \right)dXdy. $$ G(W) is also known as the Fisher Information matrix, as it gives the variance of the Cramer—Rao bound for the optimal W estimator. In this work we shall consider a natural gradient—like training for Non Linear Discriminant Analysis (NLDA) networks, a non—linear extension of Fisher’s well known Linear Discriminant Analysis introduced in [6] (more details below). Instead of following an approach along the previous lines, we observe that (1) can be viewed as the covariance $$ G\left( W \right) = E\left[ {\nabla _W e\left( {X,y;W} \right)\nabla _W e\left( {X,y;W} \right)^t } \right] $$ of the random vector $$ \nabla _W \left( {X,y;W} \right) $$ .

José R. Dorronsoro, Ana González

Kernel Methods

Frontmatter
A Greedy Training Algorithm for Sparse Least-Squares Support Vector Machines

Suykens et al. [1] describes a form of kernel ridge regression known as the least-squares support vector machine (LS-SVM). In this paper, we present a simple, but efficient, greedy algorithm for constructing near optimal sparse approximations of least-squares support vector machines, in which at each iteration the training pattern minimising the regularised empirical risk is introduced into the kernel expansion. The proposed method demonstrates superior performance when compared with the pruning technique described by Suykens et al. [1], over the motorcycle and Boston housing datasets.

Gavin C. Cawley, Nicola L. C. Talbot
Selection of Meta-parameters for Support Vector Regression

We propose practical recommendations for selecting metaparameters for SVM regression (that is, ε -insensitive zone and regularization parameter C). The proposed methodology advocates analytic parameter selection directly from the training data, rather than resampling approaches commonly used in SVM applications. Good generalization performance of the proposed parameter selection is demonstrated empirically using several low-dimensional and high-dimensional regression problems. In addition, we compare generalization performance of SVM regression (with proposed choiceε) with robust regression using ‘least-modulus’ loss function (ε=0). These comparisons indicate superior generalization performance of SVM regression.

Vladimir Cherkassky, Yunqian Ma
Kernel Matrix Completion by Semidefinite Programming

We consider the problem of missing data in kernel-based learning algorithms. We explain how semidefinite programming can be used to perform an approximate weighted completion of the kernel matrix that ensures positive semidefiniteness and hence Mercer’s condition. In numerical experiments we apply a support vector machine to the XOR classification task based on randomly sparsified kernel matrices from a polynomial kernel of degree 2. The approximate completion algorithm leads to better generalisation and to fewer support vectors as compared to a simple spectral truncation method at the cost of considerably longer runtime. We argue that semidefinite programming provides an interesting convex optimisation framework for machine learning in general and for kernel-machines in particular.

Thore Graepel
Incremental Sparse Kernel Machine

The Relevance Vector Machine (RVM) gives a probabilistic model for a sparse kernel representation. It achieves comparable performance to the Support Vector Machine (SVM) while using substantially fewer kernel bases. However, the computational complexity of the RVM in the training phase prohibits its application to large datasets. In order to overcome this difficulty, we propose an incremental Bayesian method for the RVM. The preliminary experiments showed the efficiency of our method for large datasets.

Masa-aki Sato, Shigeyuki Oba
Frame Kernels for Learning

This paper deals with a way of constructing reproducing kernel Hilbert spaces and their associated kernels from frame theory. After introducing briefly frame theory, we give mild conditions on frame elements for spanning a RKHS. Examples of different kernels are then given based on wavelet frame. Thus, issues of this way of building kernel for semiparametric learning are discussed and an application example on a toy problem is described.

Alain Rakotomamonjy, Stéphane Canu
Robust Cross-Validation Score Function for Non-linear Function Estimation

In this paper a new method for tuning regularisation parameters or other hyperparameters of a learning process (non-linear function estimation) is proposed, called robust cross-validation score function $$ \left( {CV_{S - {\mathbf{ }}fold}^{Robust} } \right) $$ . $$ CV_{S - {\mathbf{ }}fold}^{Robust} $$ is effective for dealing with outliers and non-Gaussian noise distributions on the data. Illustrative simulation results are given to demonstrate that the $$ CV_{S - {\mathbf{ }}fold}^{Robust} $$method outperforms other cross-validation methods.

Jos De Brabanter, Kristiaan Pelckmans, Johan A. K. Suykens, Joos Vandewalle
Compactly Supported RBF Kernels for Sparsifying the Gram Matrix in LS-SVM Regression Models

In this paper we investigate the use of compactly supported RBF kernels for nonlinear function estimation with LS-SVMs. The choice of compact kernels, recently proposed by Genton, may lead to computational improvements and memory reduction. Examples, however, illustrate that compactly supported RBF kernels may lead to severe loss in generalization performance for some applications, e.g. in chaotic time-series prediction. As a result, the usefulness of such kernels may be much more application dependent than the use of the RBF kernel.

Bart Hamers, Johan A. K. Suykens., Bart De Moor
The Leave-One-Out Kernel

Recently, several attempts have been made for deriving data-dependent kernels from distribution estimates with parametric models (e.g. the Fisher kernel). In this paper, we propose a new kernel derived from any distribution estimators, parametric or nonparametric. This kernel is called the Leave-one-out kernel (i.e. LOO kernel), because the leave-one-out process plays an important role to compute this kernel. We will show that, when applied to a parametric model, the LOO kernel converges to the Fisher kernel asymptotically as the number of samples goes to infinity.

Koji Tsuda, Motoaki Kawanabe
Support Vector Representation of Multi-categorical Data

We propose a new algorithm for the categorisation of data into multiple classes. It minimises a quadratic homogeneous program, and can be viewed as a generalisation of the well known support vector machines to multiple classes. For only one class it reduces to a quadratic problem, whose solution can be seen as an estimate of the support of a distribution. Given a set of labelled data, our algorithm estimates for each class a representative vector in a feature space. Each of these vectors is expressible as a linear combination of the training data in its class, mapped into feature space. Therefore our algorithm needs less parameters than other multi-class support vector approaches.

Silvio Borer, Wulfram Gerstner
Robust De-noising by Kernel PCA

Recently, kernel Principal Component Analysis is becoming a popular technique for feature extraction. It enables us to extract nonlinear features and therefore performs as a powerful preprocessing step for classification. There is one drawback, however, that extracted feature components are sensitive to outliers contained in data. This is a characteristic common to all PCA-based techniques. In this paper, we propose a method which is able to remove outliers in data vectors and replace them with the estimated values via kernel PCA. By repeating this process several times, we can get the feature components less affected with outliers. We apply this method to a set of face image data and confirm its validity for a recognition task.

Takashi Takahashi, Takio Kurita
Maximum Contrast Classifiers

Within the Bayesian setting of classification we present a method for classifier design based on constrained density modelling. The approach leads to maximization of some contrast function, which measures the discriminative power of the class-conditional densities used for classification. By an upper bound on the density contrast the sensitivity of the classifiers can be increased in regions with low density differences which are usually most important for discrimination. We introduce a parametrization of the contrast in terms of modified kernel density estimators with variable mixing weights. In practice the approach shows some favourable properties: first, for fixed hyperparameters, training of the resulting Maximum Contrast Classifier (MCC) is achieved by linear programming for optimization of the mixing weights. Second for a certain choice of the density contrast bound and the kernel bandwidth, the maximum contrast solutions lead to sparse representations of the classifiers with good generalization performance, similar to the maximum margin solutions of support vector machines. Third the method is readily furnished for the general multi-class problem since training proceeds in the same way as in the binary case.

P. Meinicke, T. Twellmann, H. Ritter
Puncturing Multi-class Support Vector Machines

Non-binary classification has been usually addressed by training several binary classification when using Support Vector Machines (SVMs), because its performance does not degrade compared to the multi-class SVM and it is simpler to train and implement. In this paper we show that the binary classifiers in which the multi-classification relies are not independent from each other and using a puncturing mechanism this dependence can be pruned, obtaining much better multi-classification schemes as shown by the carried out experiments.

Fernando Pérez-Cruz, Antonio Artés-Rodríguez
Multi-dimensional Function Approximation and Regression Estimation

In this communication, we generalize the Support Vector Machines (SVM) for regression estimation and function approximation to multi-dimensional problems. We propose a multi-dimensional Support Vector Regressor (MSVR) that uses a cost function with a hyperspherical insensitive zone, capable of obtaining better predictions than using an SVM independently for each dimension. The resolution of the MSVR is achieved by an iterative procedure over the Karush-Kuhn-Tucker conditions. The proposed algorithm is illustrated by computers experiments.

Fernando Pérez-Cruz, Gustavo Camps-Valls, Emilio Soria-Olivas, Juan José Pérez-Ruixo, Aníbal R. Figueiras-Vidal, Antonio Artés-Rodríguez
Detecting the Number of Clusters Using a Support Vector Machine Approach

In this work we introduce a new methodology to determine the number of clusters in a data set. We use a hierarchical approach that builds upon the use of any given (user-defined) clustering algorithm to produce a decision tree that returns the number of clusters. The decision rule takes advantage of the ability of Support Vector Machines (SVM) to detect both density gaps and high-density regions in data sets. The method has been successfuly applied on a variety of artificial and real data sets, covering a broad range of structures, group densities, data dimensionalities and number of groups.

Javier M. Moguerza, Alberto Muñoz, Manuel Martín-Merino
Mixtures of Probabilistic PCAs and Fisher Kernels for Word and Document Modeling

We present a generative model for constructing continuous word representations using mixtures of probabilistic PCAs. Applied to co-occurrence data, the model performs word clustering and allows the visualization of each cluster in a reduced space. In combination with a simple document model, it permits the definition of low-dimensional Fisher scores which are used as document features. We investigate the models’ potential through kernel-based methods using the corresponding Fisher kernels.

George Siolas, Florence d’Alché-Buc

Robotics and Control

Frontmatter
Reinforcement Learning for Biped Locomotion

This paper studies the reinforcement learning (RL) method for central pattern generators (CPG) that generates stable rhythmic movements such as biped locomotion. RL for biped locomotion is very difficult, since the biped robot is highly unstable and the system has continuous state and action spaces with a high degree of freedom. In order to deal with RL for CPG, we propose a new RL method called the CPG-actor-critic method. We applied this method to the RL for the biped robot. The computer simulation showed that our RL method was able to train the CPG such that the biped robot walk stably.

Masa-aki Sato, Yutaka Nakamura, Shin Ishii
Dynamical Neural Schmitt Trigger for Robot Control

Structure and function of a small but effective neural network controlling the behavior of an autonomous miniatur robot is analyzed. The controller was developed with the help of an evolutionary algorithm, and it uses recurrent connectivity structure allowing non-trivial dynamical effects. The interplay of three different hysteresis elements leading to a skilled behavior of the robot in challenging environments is explicitly discussed.

Martin Hülse, Frank Pasemann
Evolutionary Artificial Neural Networks for Quadruped Locomotion

This paper outlines the results of successful research into the production of four legged gaits in robots using Evolutionary Artificial Neural Networks. The system is based on a hierarchical model outlined in previous work and a new neuron model has been developed for use in the system. The ANNs are combined in a flexible manner to control the gait of an animat during locomotion.

David McMinn, Grant Maxwell, Christopher MacLeod
Saliency Maps Operating on Stereo Images Detect Landmarks and Their Distance

We present a model that uses binocular visual input to detect landmarks and estimates their distance based on disparity between the two images. Feature detectors provide input to saliency maps that find landmarks as combinations of features. Interactions between feature detectors for the left and right images and between the saliency maps enables corresponding landmarks to be found. We test the model in the real world and show that it reliably detects landmarks and estimates their distances.

Jörg Conradt, Pascal Simon, Michel Pescatore, Paul F. M. J. Verschure
A Novel Approach to Modelling and Exploiting Uncertainty in Stochastic Control Systems

We consider an inversion-based neurocontroller for solving control problems of uncertain nonlinear systems. Classical approaches do not use uncertainty information in the neural network models. In this paper we show how we can exploit knowledge of this uncertainty to our advantage by developing a novel robust inverse control method. Simulations on a nonlinear uncertain second order system illustrate the approach.

Randa Herzallah, David Lowe
Tool Wear Prediction in Milling Using Neural Networks

An intelligent supervisory system, which is supported on a model-based approach, is presented herein. A model, created using Artificial Neural Networks (ANN), able to predict the process output is introduced in order to deal with the characteristics of such an ill-defined process. In order to predict tool wear, residuals errors are used as basis of a decision-making algorithm. Experimental tests are made in a professional machining center. The attained results show the suitability and potential of this supervisory system for industrial applications.

Rodolfo E. Haber, A. Alique, J. R. Alique
Speeding-up Reinforcement Learning with Multi-step Actions

In recent years hierarchical concepts of temporal abstraction have been integrated in the reinforcement learning framework to improve scalability. However, existing approaches are limited to domains where a decomposition into subtasks is known a priori. In this paper we propose the concept of explicitly selecting time scale related actions if no subgoal-related abstract actions are available. This is realised with multi-step actions on different time scales that are combined in one single action set. The special structure of the action set is exploited in the MSA-Q-learning algorithm. By learning on different explicitly specified time scales simultaneously, a considerable improvement of learning speed can be achieved. This is demonstrated on two benchmark problems.

Ralf Schoknecht, Martin Riedmiller
Extended Kalman Filter Trained Recurrent Radial Basis Function Network in Nonlinear System Identification

We consider the recurrent radial basis function network as a model of nonlinear dynamic system. On-line parameter and structure adaptation is unified under the framework of extended Kalman filter. The ability of adaptive system to deal with high observation noise, and the generalization ability of the resulting RRBF network are demonstrated in nonlinear system identification.

Branimir Todorović, Miomir Stanković, Claudio Moraga
Integration of Metric Place Relations in a Landmark Graph

This paper describes a graph embedding procedure which extends the topologic information of a landmark graph with position estimates. The graph is used as an environment map for an autonomous agent, where the graph nodes contain information about places in two different ways: a panoramic image containing the landmark configuration and the estimated recording position. Calculation of the graph embedding is done with a modified “multidimensional scaling” algorithm, which makes use of distances and angles between nodes. It will be shown that especially graph circuits are responsible for preventing the path integration error from unbounded growth. Furthermore a heuristic for the MDS-algorithm is described, which makes this scheme applicable to the exploration of larger environments. The algorithm is tested with an agent building a map of a virtual environment.

Wolfgang Hübner, Hanspeter A. Mallot
Hierarchical Object Classification for Autonomous Mobile Robots

An adaptive neural 3D-object recognition architecture for mobile robot applications is presented. During training, a hierarchy of LVQ classifiers based on feature vectors with increasingly higher dimensionality is generated. The hierarchy is extended exactly in those regions of the feature space, where objects cannot be distinguished using lower-dimensional feature vectors. During recall, this system can produce object classifications in an anytime fashion with increasingly more detailed and higher confident results. Experimental data obtained from application to two real-world data sets are very encouraging. We found many of the confusion classes to represent meaningful concepts, with obvious implications for symbol grounding and integration of subsymbolic and symbolic representations.

Steffen Simon, Friedhelm Schwenker, Hans A. Kestler, Gerhard Kraetzschmar, Günther Palm
Self Pruning Gaussian Synapse Networks for Behavior Based Robots

The ability to obtain the minimal network that allows a robot to perform a given behavior without having to determine what sensors the behavior requires and to what extent each must be considered is one of the objectives of behavior based robotics. In this paper we propose Gaussian Synapse Networks as a very efficient structure for obtaining behavior based controllers that verify these conditions. We present some results on the evolution of controllers using Gaussian Synapse Networks and discuss the way in which they improve the evolution through their ability to smoothly select to what extent each signal and interval is considered within the internal processing of the network. In fact, the main result presented here is the way in which these networks provide a very efficient mechanism to prune the networks, allowing the construction of minimal networks that only make use of the signal intervals required.

J. A. Becerra, R. J. Duro, J. Santos
Second-Order Conditioning in Mobile Robots

We have proposed a neural network that learns to control avoidance behaviors of a physical mobile robot through classical conditioning and operant conditioning. In this article we test whether our network can acquire second-order conditioning. During training we first associate the activation of the robot’s infrared sensors with collisions. Then, the activation of a visual sensor is repeatedly paired with the activation of the infrared sensors. Results show that the robot learns to elicit avoidance responses whenever the visual sensor becomes active.

Samuel Benzaquen, Carolina Chang
An Optimal Sensor Morphology Improves Adaptability of Neural Network Controllers

Animals show an abundance of different sensor morphologies, for example in insect compound eyes. However, the advantages of having highly specific sensor morphologies still remain unclear. In this paper we show that an appropriate sensor morphology can improve the learning performance of an agent’s neural controller significantly. Using a sensor morphology that is “optimised” for a given task environment the agent is able to learn faster and to adapt more quickly to changes.

Lukas Lichtensteiger, Rolf Pfeifer
Learning Inverse Kinematics via Cross-Point Function Decomposition

The main drawback of using neural networks to approximate the inverse kinematics (IK) of robot arms is the high number of training samples (i.e., robot movements) required to attain an acceptable precision. We propose here a trick, valid for most industrial robots, that greatly reduces the number of movements needed to learn or relearn the IK to a given accuracy. This trick consists in expressing the IK as a composition of learnable functions, each having half the dimensionality of the original mapping. A training scheme to learn these component functions is also proposed. Experimental results obtained by using PSOMs, with and without the decomposition, show that the time savings granted by the proposed scheme grow polynomically with the precision required.

V. Ruiz de Angulo, C. Torras

Selforganization

Frontmatter
The Principal Components Analysis Self-Organizing Map

We propose a new self-organizing neural model that performs Principal Components Analysis (PCA). It is also related to the ASSOM network, but its training equations are simpler. Furthermore, it does not need any grouping of the input samples by episodes. Experimental results are reported, which show that the new model has better performance than the ASSOM network in a number of benchmark problems.

Ezequiel López-Rubio, José Muñoz-Pérez, José Antonio Gómez-Ruiz
Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps

Several methods to visualize clusters in high-dimensional data sets using the Self-Organizing Map (SOM) have been proposed. However, most of these methods only focus on the information extracted from the model vectors of the SOM. This paper introduces a novel method to visualize the clusters of a SOM based on smoothed data histograms. The method is illustrated using a simple 2-dimensional data set and similarities to other SOM based visualizations and to the posterior probability distribution of the Generative Topographic Mapping are discussed. Furthermore, the method is evaluated on a real world data set consisting of pieces of music.

Elias Pampalk, Andreas Rauber, Dieter Merkl
Rule Extraction from Self-Organizing Networks

Generalized relevance learning vector quantization (GRLVQ) [4] constitutes a prototype based clustering algorithm based on LVQ [5] with energy function and adaptive metric. We propose a method for extracting logical rules from a trained GRLVQ-network. Real valued attributes are automatically transformed to symbolic values. The rules are given in the form of a decision tree yielding several advantages: hybrid symbolic/subsymbolic descriptions can be obtained as an alternative and the complexity of the rules can be controlled.

Barbara Hammer, Andreas Rechtien, Marc Strickert, Thomas Villmann
Predictive Self-Organizing Map for Vector Quantization of Migratory Signals

Predictive self-organizing map (P-SOM) that performs an adaptive vector quantization of migratory signals is proposed. The P-SOM separates continuously varying components of the signal from random noise, resulting in a better performance of the adaptive vector quantization. An application to a communication system is presented.

Akira Hirose, Tomoyuki Nagashima
Categorical Topological Map

This paper introduces a topological map dedicated to an automatic classification categorical data. Usually, topological maps uses a numerical (or binary) coding of the categorical data during the learning process. In the present paper, we propose a probabilistic formalism where the neurons now represent probability tables. Two examples using actual and synthetic data allow to validate the approach. The results show the good quality of the topological order obtained as well as its performances in classification.

Mustapha Lebbah, Christian Chabanon, Fouad Badran, Sylvie Thiria
Spike-Timing Dependent Competitive Learning of Integrate-and-Fire Neurons with Active Dendrites

Presented is a model of an integrate-and-fire neuron with active dendrites and a spike-timing dependent Hebbian learning rule. The learning algorithm effectively trains the neuron when responding to several types of temporal encoding schemes: temporal code with single spikes, spike bursts and phase coding. The neuron model and learning algorithm are tested on a neural network with a self-organizing map of competitive neurons. The goal of the presented work is to develop computationally efficient models rather than approximating the real neurons. The approach described in this paper demonstrates the potential advantages of using the processing functionalities of active dendrites as a novel paradigm of computing with networks of artificial spiking neurons.

Christo Panchev, Stefan Wermter, Huixin Chen
Parametrized SOMs for Object Recognition and Pose Estimation

We present the “Parametrized Self-Organizing Map” (PSOM) as a method for 3D object recognition and pose estimation. The PSOM can be seen as a continuous extension of the standard Self-Organizing Map which generalizes the discrete set of reference vectors to a continuous manifold. In the context of visual learning, manifolds based on PSOMs can be used to represent the appearance of various objects. We demonstrate this approach and its merits in an application example.

Axel Saalbach, Gunther Heidemann, Helge Ritter
An Effective Traveling Salesman Problem Solver Based on Self-Organizing Map

Combinatorial optimization seems to be a harsh field for Artificial Neural Networks (ANN), and in particular the Traveling Salesman Problem (TSP) is an exemplar benchmark where ANN today are not competitive with the best heuristics from the operations research literature. The thesis upheld in this work is that the Self-Organizing feature Map (SOM) paradigm can be an effective solving method for the TSP, if combined with appropriate mechanisms improving the efficiency and the accuracy. An original TSP-solver based on the SOM is tested over the largest TSP benchmarks, on which other ANN typically fail.

Alessio Plebe
Coordinating Principal Component Analyzers

Mixtures of Principal Component Analyzers can be used to model high dimensional data that lie on or near a low dimensional manifold. By linearly mapping the PCA subspaces to one global low dimensional space, we obtain a ‘global’ low dimensional coordinate system for the data. As shown by Roweis et al., ensuring consistent global low-dimensional coordinates for the data can be expressed as a penalized likelihood optimization problem. We show that a restricted form of the Mixtures of Probabilistic PCA model allows for a more efficient algorithm. Experimental results are provided to illustrate the viability method.

Jakob J. Verbeek, Nikos Vlassis, Ben Kröse
Lateral Interactions in Self-Organizing Maps

In the literature on topographic models of cortical organization, Kohonen’s self-organizing map is often treated as a computational short-cut version of a more detailed biological architecture, in which competition in the map is regulated by excitatory and inhibitory lateral interactions. A novel lateral interaction model will be presented here, whose investigation will show: first, that the behavior of the two models is not identical; and second, that the lateral interaction architecture behaves similarly to non-topographic algorithms, constructing representations of the input at intermediate levels of detail in the initial phases of training. This observation supports a novel interpretation of the topographic organization of the cerebral cortex.

Roberto Viviani
Complexity Selection of the Self-Organizing Map

This paper describes how the complexity of the Self-Organizing Map can be selected using the Minimum Message Length principle. The use of the method in textual data analysis is also demonstrated.

Anssi Lensu, Pasi Koikkalainen
Nonlinear Projection with the Isotop Method

Isotop is a new neural method for nonlinear projection ofhigh-dimensional data. Isotop builds the mapping between the data space and a projection space by means of topology preservation. Actually, the topology of the data to be projected is approximated by the use of neighborhoods between the neural units. Isotop is provided with a piecewise linear interpolator for the projection of generalization data after learning. Experiments on artificial and real data sets show the advantages of Isotop.

John A. Lee, Michel Verleysen
Asymptotic Level Density of the Elastic Net Self-Organizing Feature Map

Whileas the Kohonen Self Organizing Map shows an asymptotic level density following a power law with a magnification exponent 2/3, it would be desired to have an exponent 1 in order to provide optimal mapping in the sense of information theory. In this paper, we study analytically and numerically the magnification behaviour of the Elastic Net algorithm as a model for self-organizing feature maps. In contrast to the Kohonen map the Elastic Net shows no power law, but for onedimensional maps nevertheless the density follows an universal magnification law, i.e. depends on the local stimulus density only and is independent on position and decouples from the stimulus density at other positions.

Jens Christian Claussen, Heinz Georg Schuster
Local Modeling Using Self-Organizing Maps and Single Layer Neural Networks

The paper presents a method for time series prediction using local dynamic modeling. After embedding the input data in a reconstruction space using a memory structure, a self-organizing map (SOM) derives a set of local models from these data. Afterwards, a set of single layer neural networks, trained optimally with a system of linear equations, is applied at the SOM’s output. The goal of the last network is to fit a local model from the winning neuron and a set of neighbours of the SOM map. Finally, the performance of the proposed method was validated using two chaotic time series.

Oscar Fontenla-Romero, Amparo Alonso-Betanzos, Enrique Castillo, Jose C. Principe, Bertha Guijarro-Berdiñas
Distance Matrix Based Clustering of the Self-Organizing Map

Clustering of data is one of the main applications of the Self-Organizing Map (SOM). U-matrix is a commonly used technique to cluster the SOM visually. However, in order to be really useful, clustering needs to be an automated process. There are several techniques which can be used to cluster the SOM autonomously, but the results they provide do not follow the results of U-matrix very well. In this paper, a clustering approach based on distance matrices is introduced which produces results which are very similar to the U-matrix. It is compared to other SOM-based clustering approaches.

Juha Vesanto, Mika Sulkava
Mapping the Growing Neural Gas to Situation Calculus

We propose a dynamic mapping of the operation of the Growing Neural Gas model to Situation Calculus, with the purpose of grounding the relatively higher level concepts of Situation Calculus to lower level signals. Since both the Situation Calculus and the Growing Neural Gas model were conceived with the express purpose of describing dynamic phenomena, this transformation is natural. We believe that the transformation will also be useful in data mining tasks. Finally, we present experimental results as an early evaluation of our method.

Dimitrios Vogiatzis, Andreas Stafylopatis
Robust Unsupervised Competitive Neural Network by Local Competitive Signals

Unsupervised competitive neural networks have been recognized as a powerful tool for pattern analysis, feature extraction and clustering analysis. The global competitive structures tend to critically depend on the number of elements in the networks and on the noise property of the space. In order to overcome these problems in this work is presented an unsupervised competitive neural network characterized by units with an adaptive threshold and local inhibitory interactions among its cells. Each neural unit is based on a modified competitive learning law in which the threshold changes in learning stage. It is shown that the proposed neuron is able, during the learning stage, to perform an automatic selection of patterns that belong to a cluster, moving towards its centroid. The properties of this network, are examined in a set of simulations adopting a data set composed of Gaussian mixtures.

Ernesto Chiarantoni, Giuseppe Acciani, Girolamo Fornarelli, Silvano Vergura
Goal Sequencing for Construction Agents in a Simulated Environment

A connectionist architecture enables a society of agents to efficiently construct 2D structures. The agents use internal spatial maps to compute a sequence of construction actions that reduces total distance traveled. All computations are done over grids of neurons interacting locally. Simulation results are presented.

Anand Panangadan, Michael G. Dyer
Nonlinear Modeling of Dynamic Systems with the Self-Organizing Map

In this paper we propose an unsupervised neural modeling technique, called Vector- Quantized Temporal Associative Memory (VQ-TAM). Using VQTAM, the Kohonen’s self-organizing map (SOM) becomes capable of approximating dynamical nonlinear mappings from time series of measured input-output data. The SOM produces modeling results as accurate as those produced by multilayer perceptron (MLP) networks, and better than those produced by radial basis functions (RBF) networks, both the MLP and the RBF based on supervised training. In addition, the SOM is less sensitive to weight initialization than MLP networks. The three networks are evaluated through simulations and compared with the linear ARX model in the forward modeling of a hydraulic actuator.

Guilherme A. de Barreto, Aluizio F. R. Araújo
Implementing Relevance Feedback as Convolutions of Local Neighborhoods on Self-Organizing Maps

The Self-Organizing Map (SOM) can be used in implementing relevance feedback in an information retrieval system. In our approach, the map surface is convolved with a window function in order to spread the responses given by a human user for the seen data items. In this paper, a number of window functions with different sizes are compared in spreading positive and negative relevance information on the SOM surfaces in an image retrieval application. In addition, a novel method for incorporating location-dependent information on the relative distances of the map units in the window function is presented.

Markus Koskela, Jorma Laaksonen, Erkki Oja
A Pareto Self-Organizing Map

Self Organizing Features Maps are used for a variety of tasks in visualization and clustering, acting to transform data from a high-dimensional original feature space to a (usually) two-dimensional grid. SOFMs use a similarity metric in the input space, and this composes individual feature differences in a way that is not always desirable. This paper introduces the concept of a Pareto SOFM, which partitions features into groups, defines separate metrics in each partition, and retrieves a set of prototypes that trade off matches in different partitions. It is suitable for a wide range of exploratory tasks, including visualization and clustering....

Andrew Hunter, Richard Lee Kennedy
A SOM Variant Based on the Wilcoxon Test for Document Organization and Retrieval

A variant of the self-organizing maps algorithm is proposed in this paper for document organization and retrieval. Bigrams are used to encode the available documents and signed ranks are assigned to these bigrams according to their frequencies. A novel metric which is based on the Wilcoxon signed-rank test exploits these ranks in assessing the contextual similarity between documents. This metric replaces the Euclidean distance employed by the self-organizing maps algorithm in identifying the winner neuron. Experiments performed using both algorithms demonstrates a superior performance of the proposed variant against the self-organizing map algorithm regarding the average recall-precision curves.

Apostolos Georgakis, Costas Kotropoulos, Ioannis Pitas
Learning More Accurate Metrics for Self-Organizing Maps

Improved methods are presented for learning metrics that measure only important distances. It is assumed that changes in primary data are relevant only to the extent that they cause changes in auxiliary data, available paired with the primary data. The metrics are here derived from estimators of the conditional density of the auxiliary data. More accurate estimators are compared, and a more accurate approximation to the distances is introduced. The new methods improved the quality of Self-Organizing Maps (SOMs) significantly for four of the five studied data sets.

Jaakko Peltonen, Arto Klami, Samuel Kaski
Correlation Visualization of High Dimensional Data Using Topographic Maps

Correlation analysis has always been a key technique for understanding data. However, traditional methods are only applicable on the whole data set, providing only global information on correlations. Correlations usually have a local nature and two variables can be directly and inversely correlated at different points in the same data set. This situation arises typically in nonlinear processes. In this paper we propose a method to visualize the distribution of local correlations along the whole data set using dimension reduction mappings. The ideas are illustrated through an artificial data example.

Ignacio Díaz Blanco, Abel A. Cuadrado Vega, Alberto B. Diez González

Signal and Time Series Analysis

Frontmatter
Continuous Unsupervised Sleep Staging Based on a Single EEG Signal

We report improvements on automatic continuous sleep staging using Hidden Markov Models (HMM). Our totally unsupervised approach detects the cornerstones of human sleep (wakefulness, deep and rem sleep) with around 80% accuracy based on data from a single EEG channel. Contrary to our previous efforts we trained the HMM on data from a single sleep lab instead of generalizing to data from diverse sleep labs. This solved our previous problem of detecting rem sleep.

Arthur Flexer, Georg Gruber, Georg Dorffner
Financial APT-Based Gaussian TFA Learning for Adaptive Portfolio Management

Adaptive portfolio management has been studied in the literature of neural nets and machine learning. The recently developed Temporal Factor Analysis (TFA) model mainly targeted for further study of the Arbitrage Pricing Theory (APT) is found to have potential applications in portfolio management. In this paper, we aim to illustrate the superiority of APT-based portfolio management over return-based portfolio management.

Kai Chun Chiu, Lei Xu
On Convergence of an Iterative Factor Estimate Algorithm for the NFA Model

The iterative fixed posteriori approximation (iterative FPA) has been empirically shown to be an efficient approach for the MAP factor estimate in the Non-Gaussian Factor Analysis (NFA) model. In this paper we further prove that it is exactly an EM algorithm for the MAP factor estimate problem. Thus its convergence can be guaranteed. We also empirically show that NFA has better generalization ability than Independent Factor Analysis (IFA) on data with small sample size.

Zhiyong Liu, Lei Xu
Error Functions for Prediction of Episodes of Poor Air Quality

Prediction of episodes of poor air quality using artificial neural networks is investigated. Logistic regression, conventional sum-of-squares regression and heteroscedastic sum-of-squares regression are employed for the task of predicting real-life episodes of poor air quality in urban Belfast due to SO2. In each case, a Bayesian regularisation scheme is used to prevent over-fitting of the training data and to provide pruning of redundant model parameters. Non-linear models assuming a heteroscedastic Gaussian noise process are shown to provide the best predictors of pollutant concentration of the methods investigated.

Robert J. Foxall, Gavin C. Cawley, Stephen R. Dorling, Danilo P. Mandic
Adaptive Importance Sampling Technique for Neural Detector Training

In this paper, we develop the use of an adaptive Importance Sampling (IS) technique in neural network training, for applications to detection in communication systems. Some topics are reconsidered, such as modifications of the error probability objective function (Pe), optimal and suboptimal IS probability density functions (biasing density functions), and adaptive importance sampling. A genetic algorithm was used for the neural network training, having utilized an adaptive IS technique for improving Pe estimations in each iteration of the training. Also, some simulation results of the training process are included in this paper.

José L. Sanz-González, Francisco Álvarez-Vaquero
State Space Neural Networks for Freeway Travel Time Prediction

The highly non-linear characteristics of the freeway travel time prediction problem require a modeling approach that is capable of dealing with complex non-linear spatio-temporal relationships between the observable traffic quantities. Based on a state-space formulation of the travel time prediction problem, we derived a recurrent state-space neural network (SSNN) topology. The SSNN model is capable of accurately predicting experienced travel times -outperforming current practice by far - producing approximately zero mean normally distributed residuals, generally not outside a range of 10% of the real expected travel times. Furthermore, analyses of the internal states and the weight configurations revealed that the SSNN developed an internal models closely related to the underlying traffic processes. This allowed us to rationally eliminate the insignificant parameters, resulting in a Reduced SSNN topology, with just 63 adjustable weights, yielding a 72% reduction in model-size, without loss of predictive performance.

Hans van Lint, Serge P. Hoogendoorn, Henk J. van Zuylen
Overcomplete ICA with a Geometric Algorithm

We present an independent component analysis (ICA) algorithm based on geometric considerations [10] [11] to decompose a linear mixture of more sources than sensor signals. Bofill and Zibulevsky [2] recently proposed a two-step approach for the separation: first learn the mixing matrix, then recover the sources using a maximum-likelihood approach. We present an efficient method for the matrix-recovery step mimicking the standard geometric algorithm thus generalizing Bofill and Zibulevsky’s method.

Fabian J. Theis, Elmar W. Lang, Tobias Westenhuber, Carlos G. Puntonet
Improving Long- Term Online Prediction with Decoupled Extended Kalman Filters

Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) outperform traditional RNNs when dealing with sequences involving not only short-term but also long-term dependencies. The decoupled extended Kalman filter learning algorithm (DEKF) works well in online environments and reduces significantly the number of training steps when compared to the standard gradient-descent algorithms. Previous work on LSTM, however, has always used a form of gradient descent and has not focused on true online situations. Here we combine LSTM with DEKF and show that this new hybrid improves upon the original learning algorithm when applied to online processing.

Juan A. Pérez-Ortiz, Jürgen Schmidhuber, Felix A. Gers, Douglas Eck
Market Modeling Based on Cognitive Agents

In this paper, we present an explanatory multi-agent model. The agents decision making is based on cognitive systems with three basic features (perception, internal processing and action). The interaction of the agents allows us to capture the market dynamics. The three features are derived deductively from the assumption of homeostasis and constitute necessary conditions of a cognitive system. Given a changing environment, homeostasis can be seen as the attempt of a cognitive agent to maintain an internal equilibrium. We model the cognitive system with a time-delay recurrent neural network.We apply our approach to the DEM / USD FX-Market. Fitting realworld data, our approach is superior to a preset benchmark (MLP).

Georg Zimmermann, Ralph Grothmann, Christoph Tietz, Ralph Neuneier
An Efficiently Focusing Large Vocabulary Language Model

Accurate statistical language models are needed, for example, for large vocabulary speech recognition. The construction of models that are computationally efficient and able to utilize long-term dependencies in the data is a challenging task. In this article we describe how a topical clustering obtained by ordered maps of document collections can be utilized for the construction of efficiently focusing statistical language models. Experiments on Finnish and English texts demonstrate that considerable improvements are obtained in perplexity compared to a general n-gram model and to manually classified topic categories. In the speech recognition task the recognition history and the current hypothesis can be utilized to focus the model towards the current discourse or topic, and then apply the focused model to re-rank the hypothesis.

Mikko Kurimo, Krista Lagus
Neuro- classification of Bill Fatigue Levels Based on Acoustic Wavelet Components

This paper proposes a new method to classify bills(paper moneys) into different fatigue levels due to the extent of their damage. While a bill passing through a banking machine, a characteristic acoustic signal is emitted from the bill. To classify the acoustic signal into three bill fatigue levels, we calculate the acoustic wavelet power pattern as the input to a competitive neural network with the Learning Vector Quantization(LVQ) algorithm. The experimental results show that the proposed method can obtain better classification performance than the best of conventional acoustic signal based classification methods. It is, consequently, the LVQ algorithm demonstrates a good classification.

Masaru Teranishi, Sigeru Omatu, Toshihisa Kosaka
Robust Estimator for the Learning Process in Neural Networks Applied in Time Series

Artificial Neural Networks (ANN) have been used to model non-linear time series as an alternative of the ARIMA models. In this paper Feedforward Neural Networks (FANN) are used as non-linear autoregressive (NAR) models. NAR models are shown to lack robustness to innovative and additive outliers. A single outlier can ruin an entire neural network fit. Neural networks are shown to model well in regions far from outliers, this is in contrast to linear models where the entire fit is ruined. We propose a robust algorithm for NAR models that is robust to innovative and additive outliers. This algorithm is based on the generalized maximum likelihood (GM) type estimators, which shows advantages over conventional least squares methods. This sensitivity to outliers is demostrated based on a synthetic data set.

Héctor Allende, Claudio Moraga, Rodrigo Salas
An Improved Cumulant Based Method for Independent Component Analysis

An improved method for independent component analysis based on the diagonalization of cumulant tensors is proposed. It is based on Comon’s algorithm [1] but it takes third- and fourth-order cumulant tensors into account simultaneously. The underlying contrast function is also mathematically much simpler and has a more intuitive interpretation. It is therefore easier to optimize and approximate. A comparison with Comon’s algorithm, JADE [2] and FastICA [3] on different data sets demonstrates its performance.

Tobias Blaschke, Laurenz Wiskott
Finding the Optimal Continuous Model for Discrete Data by Neural Network Interpolation of Fractional Iteration

Given the complete knowledge of the state variables of a dynamical system at fixed intervals, it is possible to construct a mapping, which is a perfect discrete time model of the system. To embed this into a continuum, the translation equation has to be solved for this mapping. However, in general, neither existence nor uniqueness of solutions can be guaranteed, but fractional iterates of the mapping computed by a neural network can provide regularized solutions that exactly comply with the laws of physics for several examples. Here we extend this method to continuous embeddings which represent the true trajectories of the dynamical system.

Lars Kindermann, Achim Lewandowski, Peter Protzel
Support Vector Robust Algorithms for Non- parametric Spectral Analysis

A new approach to the non-parametric spectral estimation on the basis of the Support Vector (SV) framework is presented. Two algorithms are derived for both uniform and non-uniform sampling. The relationship between the SV free parameters and the underlying process statistics is discussed. The application in two real data examples, the sunspot numbers and the Heart Rate Variability, shows the higher resolution and robustness in SV spectral analysis algorithms.

José Luis Rojo- Álvarez, Arcadi García-Alberola, Manel Martínez-Ramón, Mariano Valdés, Aníbal R. Figueiras-Vidal, Antonio Artés-Rodríguez
Support Vector Method for ARMA System Identification: A Robust Cost Interpretation

This paper deals with the application of the Support Vector Method (SVM) methodology to the Auto Regressive and Moving Average (ARMA) linear-system identification problem. The SVM-ARMA algorithm for a single-input single-output transfer function is formulated. The relationship between the SVM coefficients and the residuals, together with the embedded estimation of the autocorrelation function, are presented. Also, the effect of the numerical regularization is used to highlight the robust cost character of this approach. A clinical example is presented for qualitative comparison with the classical Least Squares (LS) methods.

José Luis Rojo-Álvarez, Manel Martínez-Ramón, Aníbal R. Figueiras-Vidal, Mario de Prado-Cumplido, Antonio Artés-Rodríguez
Dynamics of ICA for High- Dimensional Data

The learning dynamics close to the initial conditions of an on-line Hebbian ICA algorithm has been studied. For large input dimension the dynamics can be described by a diffusion equation.A surprisingly large number of examples and unusually low initial learning rate are required to avoid a stochastic trapping state near the initial conditions. Escape from this state results in symmetry breaking and the algorithm therefore avoids trapping in plateau-like fixed points which have been observed in other learning algorithms.

Gleb Basalyga, Magnus Rattray
Beyond Comon’s Identifiability Theorem for Independent Component Analysis

In this paper, Comon’s conventional identifiability theorem for Independent Component Analysis (ICA) is extended to the case of mixtures where several gaussian sources are present. We show, in an original and constructive proof, that using the conventional mutual information minimization framework, the separation of all the non- gaussian sources is always achievable (up to scaling factors and permutations). In particular, we prove that a suitably designed optimization framework is capable of seamlessly handling both the case of one single gaussian source being present in the mixture (separation of all sources achievable), as well as the case of multiple gaussian signals being mixed together with non-gaussian signals (only the non-gaussian sources can be extracted).

Riccardo Boscolo, Hong Pan, Vwani P. Roychowdhury
Temporal Processing of Brain Activity for the Recognition of EEG Patterns

This paper discusses three common strategies to incorporate temporal dynamics of brain activity to recognize 3 mental tasks from spontaneous EEG signals. The networks have been tested in a hard experimental setup; namely, generalization over different recording sessions while analyzing short time windows. It turns out that the simple local neural classifier currently embedded in our BCI, which averages the response to 8 consecutive EEG samples, is to be preferred to more complex time-processing networks such as TDNN and El-man-like. With this local classifier, users with some hours of training are able to operate several brain-actuated applications.

Alexandre Hauser, Pierre-Edouard Sottas, José del R. Millán
Critical Assessment of Option Pricing Methods Using Artificial Neural Networks

In this paper we compare the predictive ability of the Black-Scholes Formula (BSF) and Artificial Neural Networks (ANNs) to price call options by exploiting historical volatility measures. We use daily data for the S&P 500 European call options and the underlying asset and furthermore, we employ nonlinearly interpolated risk-free interest rate from the Federal Reserve board for the period 1998 to 2000. Using the best models in each sub-period tested, our preliminary results demon strate that by using historical measures of volatility, ANNs outperform the BSF. In addition, the ANNs performance improves even more when a hybrid ANN model is utilized. Our results are significant and differ from previous literature. Finally, we are currently extending the research in order to: a) incorporate appropriate implied volatility per contract with the BSF and ANNs and b) investigate the applicability of the models using trading strategies.

Panayiotis Ch. Andreou, Chris Charalambous, Spiros H. Martzoukos
Single Trial Detection of EEG Error Potentials: A Tool for Increasing BCI Transmission Rates

It is a well-known finding in human psychophysics that a subject’s recognition of having committed a response error is accompagnied by specific EEG variations that can easily be observed in averaged event-related potentials (ERP). Here, we present a pattern recognition approach that allows for a robust single trial detection of this error potential from multichannel EEG signals. By designing classifiers that are capable of bounding false positives (FP), which would classify correct responses as errors, we achieve performance characteristics that make this method appealing for response-verification or even response-correction in EEG-based communication, i.e., brain-computer interfacing (BCI). This method provides a substantial improvement over the choice of a simple amplitude threshold criterion, as it had been utilized earlier for single trial detection of error potentials.

Benjamin Blankertz, Christin Schäfer, Guido Dornhege, Gabriel Curio
Dynamic Noise Annealing for Learning Temporal Sequences with Recurrent Neural Networks

We present an algorithm inspired by diffusion networks for learning the input/output mapping of temporal sequences with recurrent neural networks. Noise is added to the activation dynamics of the neurons of the hidden layer and annealed during learning of an output path probability distribution. Noise therefore plays the role of a learning parameter. We compare some results obtained on 2 temporal tasks with this “dynamic noise annealing” algorithm with other learning algorithms. Finally we discuss why adding noise to the state space variables can be better than adding stochasticity in the weight space.

Pierre-Edouard Sottas, Wulfram Gerstner
Convolutional Neural Networks for Radar Detection

The use of convolutional neural networks (CNN’s) for radar detection is evaluated. The detector includes a time-frequency block that has been implemented by the Wigner-Ville distribution and the Short-Time Fourier Transform to test the suitability of both techniques. The CNN detectors are compared with the classic multilayer perceptron and with several traditional non-neural detectors. Preliminary results are shown using non-correlated and correlated Rayleigh-envelope clutter.

Gustavo López- Risueño, Jesús Grajal, Simon Haykin, Rosa Díaz-Oliver
A Simple Generative Model for Single-Trial EEG Classification

In this paper we present a simple and straightforward approach to the problem of single-trial classification of event-related potentials (ERP) in EEG. We exploit the well-known fact that event-related drifts in EEG potentials can well be observed if averaged over a sufficiently large number of trials. We propose to use the average signal and its variance as a generative model for each event class and use Bayes decision rule for the classification of new, unlabeled data. The method is successfully applied to a data set from the NIPS*2001 Brain-Computer Interface post-workshop competition.

Jens Kohlmorgen, Benjamin Blankertz
Robust Blind Source Separation Utilizing Second and Fourth Order Statistics

We introduce identifiability conditions for the blind source separation (BSS) problem, combining the second and fourth order statistics. We prove that under these conditions, well known methods (like eigen-value decomposition and joint diagonalization) can be applied with probability one, i.e. the set of parameters for which such a method doesn’t solve the BSS problem, has a measure zero.

Pando Georgiev, Andrzej Cichocki
Adaptive Differential Decorrelation: A Natural Gradient Algorithm

In this paper, I introduce a concept of differential decorrelation which finds a linear mapping that minimizes the concurrent change of variables. Motivated by the differential anti-Hebbian rule [1], I develop a natural gradient algorithm for differential decorrelation and present its local stability analysis. The algorithm is successfully applied to the task of nonstationary source separation.

Seungjin Choi
An Application of SVM to Lost Packets Reconstruction in Voice-Enabled Services

Voice over IP (VoIP) is becoming very popular due to the huge range of services that can be implemented by integrating different media (voice, audio, data, etc.). Besides, voice-enabled interfaces for those services are being very actively researched. Nevertheless the impoverishment of voice quality due to packet losses severely affects the speech recognizers supporting those interfaces ([8]). In this paper, we have compared the usual lost packets reconstruction method with an SVM-based one that outperforms previous results.

Carmen Peláez-Moreno, Emilio Parrado-Hernández, Ascensión Gallardo-Antolín, Adrián Zambrano-Miranda, Fernando Díaz-de-María
Baum-Welch Learning in Discrete Hidden Markov Models with Linear Factorial Constraints

Here, I introduce a transformation-based method for extending the Baum-Welch algorithm to the training of discrete Hidden Markov Models subject to constraints on the parameters. A class of certain linear factorial constraints is described and shown to lead to exact reestimation formulas. Applying these constraints to the hidden state transitions allows to estimate processes that are cartesian products of multiple sub-processes on differing timescales. The applicability of the method has been demonstrated previously using constraints on both hidden and observation processes. The potential benefit of the approach is discussed in qualitative comparison to factorial Hidden Markov Model architectures.

Jens R. Otterpohl
Mixtures of Autoregressive Models for Financial Risk Analysis

The structure of the time-series of returns for the IBEX35 stock index is analyzed by means of a class of non-linear models that involve probabilistic mixtures of autoregressive processes. In particular, a specification and implementation of probabilistic mixtures of GARCH processes is presented. These mixture models assume that the time series is generated by one of a set of alternative autoregressive models whose probabilities are produced by a gating network. The ultimate goal is to provide an adequate framework for the estimation of conditional risk measures, which can account for non-linearities, heteroskedastic structure and extreme events in financial time series. Mixture models are sufficiently flexible to provide an adequate description of these features and can be used as an effective tool in financial risk analysis.

Alberto Suárez

Vision and Image Processing

Frontmatter
Kernel-Based 3D Object Representation

In this paper we describe how kernel-based novelty detection can be used effectively to model 3D objects from unconstrained image sequences, in order to deal with object identification and recognition. In this framework, we introduce a similarity measure based on the Hausdorff distance, well suited to represent, identify, and recognize 3D objects from grey-level images. The effectiveness of the method is shown on the representation and identification of rigid 3D objects in cluttered environments.

Annalisa Barla, Francesca Odone
Multiresolution Support for Adaptive Image Restoration Using Neural Networks

This paper treats the restoration problem of degraded and noisy image. In order to keep the image structures unaltered, an adaptive regularization scheme is employed that allows better compromise between the inversion degradation process and the smoothing. The inversion process is achieved by means the modified Hopfield neural network. Moreover, the smoothing operation is accomplished in the wavelets basis by using the à trou algorithm. A multiresolution support is deduced, and combined with a statistics analysis, for computing the adaptive regularization, in which, each scale (sub-image) is assigned to one regularization parameter according to a spatial activity of the pixels which constitute it.

Souheila Ghennam, Khier Benmahammed
Audio-Visual Speech Recognition One Pass Learning with Spiking Neurons

We present a new application in the field of impulse neurons: audio-visual speech recognition. The features extracted from the audio (cepstral coefficients) and the video (height, width of the mouth, percentage of black and white pixels in the mouth) are sufficiently simple to consider a real time integration of the complete system. A generic preprocessing makes it possible to convert these features into an impulse sequence treated by the neural network which carries out the classification. The training is done in one pass: the user pronounces once all the words of the dictionary. The tests on the European M2VTS Data Base shows the interest of such a system in audio-visual speech recognition. In the presence of noise in particular, the audio-visual recognition is much better than the recognition based on the audio modality only.

Renaud Séguier, David Mercier
An Algorithm for Image Representation as Independent Levels of Resolution

Recently it has been shown that natural images possess a special type of scale invariant statistics (multiscaling). In this paper, we will show how the multiscaling properties of images can be used to derive a redundancy-reducing oriented wavelet basis. This kind of representation can be learnt from the data and is optimally adapted for image coding; besides, it shows some features found in the visual pathway.

Antonio Turiel, Jean-Pierre Nadal, Néstor Parga
Circular Back-Propagation Networks for Measuring Displayed Image Quality

A system based on a neural-network estimates the perceived quality of digital pictures that had previously undergone image-enhancement algorithms. The objective system exploits the ability of feed-forward networks to handle multidimensional data with non-linear relationships. A Circular Back-Propagation network maps feature vectors into the associated quality ratings, thus estimating perceived quality. Feature vectors characterize the image at a global level by exploiting statistical properties of objective features, which are extracted on a block-by-block basis. A feature-selection procedure based on statistical analysis drives the composition of the objective metric set. Experimental results confirm the approach effectiveness, as the system provides a satisfactory approximation of subjective tests involving human voters.

Paolo Gastaldo, Rodolfo Zunino, Ingrid Heynderickx, Elena Vicario
Unsupervised Learning of Combination Features for Hierarchical Recognition Models

We propose a cortically inspired hierarchical feedforward model for recognition and investigate a new method for learning optimal combination-coding cells in intermediate stages of the hierarchical network. The model architecture is characterized by weight-sharing, pooling, and Winner-Take-All nonlinearities. We show that an unsupervised sparse coding learning rule can be used to obtain a recognition architecture that is competitive with other more formally abstracted recognition approaches based on supervised learning. We evaluate the performance on object and face databases.

Heiko Wersing, Edgar Körner
Type of Blur and Blur Parameters Identification Using Neural Network and Its Application to Image Restoration

The original solution of the blur and blur parameters identification problem is presented in this paper. A neural network based on multi-valued neurons is used for the blur and blur parameters identification. It is shown that using simple single-layered neural network it is possible to identify the type of the distorting operator. Four types of blur are considered: defocus, rectangular, motion and Gaussian ones. The parameters of the corresponding operator are identified using a similar neural network. After a type of blur and its parameters identification the image can be restored using several kinds of methods.

Igor Aizenberg, Taras Bregin, Constantine Butakoff, Victor Karnaukhov, Nickolay Merzlyakov, Olga Milukova
Using Neural Field Dynamics in the Context of Attentional Control

We present an application of dynamic neural fields for selection and tracking in the attentional control part of an active vision system. We propose a novel two-stage selection mechanism, in which the fields are used for the first selection stage. We discuss different variants, introducing 3D neural fields and systems of interconnected fields. The dynamics can be shown to achieve important goals in active vision like robust selection, multi-object tracking, and spatiotemporal integration.

Gerriet Backer, Bärbel Mertsching
A Component Association Architecture for Image Understanding

A constructive approach for the detection of objects with topological variances is introduced. The architecture enables shift and scale invariant detection and is trained through supervised learning. Representations of the input data are built by combining association elements in a hierarchical grid. This leads to a big flexibility of representation employing only few element types. Simulation results are given for the task of detecting windows of buildings in real world images.

Jens Teichert, Rainer Malaka
Novelty Detection in Video Surveillance Using Hierarchical Neural Networks

A hierarchical self-organising neural network is described for the detection of unusual pedestrian behaviour in video-based surveillance systems. The system is trained on a normal data set, with no prior information about the scene under surveillance, thereby requiring minimal user input. Nodes use a trace activation rule and feedforward connections that are modified so higher layer nodes are sensitive to trajectory segments traced across the previous layer. Top layer nodes have binary lateral connections and corresponding “novelty accumulator” nodes. Lateral connections are set between co-occurring nodes, generating a signal to prevent accumulation of the novelty measure along normal sequences. In abnormal sequences the novelty accumulator nodes are allowed to increase their activity, generating an alarm state.

Jonathan Owens, Andrew Hunter, Eric Fletcher
Vergence Control and Disparity Estimation with Energy Neurons: Theory and Implementation

The responses of disparity-tuned neurons computed according to the energy model are used for reliable vergence control of a stereo camera head and for disparity estimation. Adjustment of symmetric vergence is driven by minimization of global image disparity resulting in greatly reduced residual disparities. To estimate disparities, cell activities of four frequency channels are pooled and normalized. In contrast to previous active stereo systems based on Gabor filters, our approach uses the responses of simulated neurons which model complex cells in the vertebrate visual cortex.

Wolfgang Stürzl, Ulrich Hoffmann, Hanspeter A. Mallot
Population Coding of Multiple Edge Orientation

We present a probabilistic population coding model of Gabor filter responses. Based on the analytically derived orientation tuning function and a von Mises mixture model of the filter responses, a probability density function of the local orientation in a given point can be extracted through a parameter estimation procedure. The probability density captures angular information at edges, corners or T-junctions and also yields a contrast invariant description of the certainty of each orientation estimate, which can be characterized in terms of the entropy of the corresponding mixture component.

Niklas Lüdtke, Richard C. Wilson, Edwin R. Hancock
A Neural Model of the Fly Visual System Applied to Navigational Tasks

We investigate how elementary motion detectors (EMDs) can be used to control behavior. We have developed a model of the fly visual system which operates in real time under real world conditions and was tested in course and altitude stabilization tasks using a flying robot. While the robot could stabilize gaze i.e. orientation, we found that stabilizing translational movements requires more elaborate preprocessing of the visual input and fine tuning of the EMDs. Our results show that in order to control gaze and altitude EMD information needs to be computed in different processing streams.

Cyrill Planta, Jörg Conradt, Adrian Jencik, Paul Verschure
A Neural Network Model for Pattern Recognition Based on Hypothesis and Verification with Moving Region of Attention

We present a neural network model for pattern recognition which works successfully even if only a part of a pattern is presented on the retina. During the recognition process, (i) local features are extracted, (ii) a hypothesis for a partial pattern is generated using shift-invariant features, (iii) the hypothesis is verified by collating with the real positions of features. The verification process gradually corrects positional displacement of the presented partial pattern while the processes (i)-(iii) are executed. Computer simulations show that the model is tolerant for vast amounts of shift, deformation and noise.

Masao Shimomura, Shunji Satoh, Shogo Miyake, Hirotomo Aso
Automatic Fingerprint Verification Using Neural Networks

This paper presents an application of Learning Vector Quantization (LVQ) neural network (NN) to Automatic Fingerprint Verification (AFV). The new approach is based on both local (minutiae) and global image features (shape signatures). The matched minutiae are used as reference axis for generating shape signatures which are then digitized to form a feature vector describing the fingerprint. A LVQ NN is trained to match the fingerprints using the difference of a pair of feature vectors. The results show that the integrated system significantly outperforms the minutiae-based system alone in terms of classification accuracy. It also confirms the ability of the trained NN to have consistent performance on unseen databases.

Anna Ceguerra, Irena Koprinska
Fusing Images with Multiple Focuses Using Support Vector Machines

Optical lenses, particularly those with long focal lengths, suffer from the problem of limited depth of field. Consequently, it is often difficult to obtain good focus for all the objects in the scene. One approach to address this problem is by performing image fusion, i.e., several pictures with different focus points are combined to a single image. This paper proposes a multifocus image fusion method based on the discrete wavelet frame transform and support vector machines. Experimental results show that the proposed method outperforms the conventional approach based on the discrete wavelet transform and maximum selection rule, particularly when there is slight camera/object movement or mis-registration of the source images.

Shutao Li, James T. Kwok, Yaonan Wang
An Analog VLSI Pulsed Neural Network for Image Segmentation Using Adaptive Connection Weights

An analog VLSI pulsed neural network for image segmentation using adaptive connection weights is presented. The network marks segments in the image through synchronous firing patterns. The synchronization is achieved through adaption of connection weights. The adaption uses only local signals in a data-driven and self-organizing way. It is shown that for the proposed adaption rules a simple analog VLSI implementation is feasible due to the required local connections and the data-driven self-organizing approach.

Arne Heittmann, Ulrich Ramacher, Daniel Matolin, Jörg Schreiter, Rene Schüffny
Kohonen Maps Applied to Fast Image Vector Quantization

Vector Quantization (VQ) is a powerful technique for image compression but its coding complexity may be an important drawback. Self-Organizing Maps (SOM) are well suited for topologically ordered codebook design. We propose to use that topology for reducing image coding time. Using inter-block correlations, the nearest neighbor search is restricted to the neighborhood of the precedingly used code vector instead of the entire codebook. We obtained a reduction of up to 84% in the coding time compared to full search.

Christophe Foucher, Daniel Le Guennec, Gilles Vaucher
Unsupervised - Neural Network Approach for Efficient Video Description

MPEG-4 object oriented video codec implementations are rapidly emerging as a solution to compress audio-video information in an efficient way, suitable for narrowband applications.A different view is proposed in this paper: several images in a video sequence result very close to each other. Each image of the sequence can be seen as a vector in a hyperspace and the whole video can be considered as a curve described by the image-vector at a given time instant.The curve can be sampled to represent the whole video, and its evolution along the video space can be reconstructed from its video-samples. Any image in the hyperspace can be obtained by means of a reconstruction algorithm, in analogy with the reconstruction of an analog signal from its samples; anyway, here the multi-dimensional nature of the problem asks for the knowledge of the position in the space and a suitable interpolating kernel function.The definition of an appropriate Video Key-frames Codebook is introduced to simplify video reproduction; a good quality of the predicted image of the sequence might be obtained with a few information parameters. Once created and stored the VKC, the generic image in the video sequence can be referred to the selected key-frames in the codebook and reconstructed in the hyperspace from its samples.Focus of this paper is on the analysis phase of a give video sequence. Preliminary results seem promising.

Giuseppe Acciani, Ernesto Chiarantoni, Daniela Girimonte, Cataldo Guaragnella
Neural Networks Retraining for Unsupervised Video Object Segmentation of Videoconference Sequences

In this paper efficient performance generalization of neural network classifiers is accomplished, for unsupervised video object segmentation in videoconference/videophone sequences. Each time conditions change, a retraining phase is activated and the neural network classifier is adapted to the new environment. During retraining both the former and current knowledge are utilized so that good network generalization is achieved. The retraining algorithm results in the minimization of a convex function subject to linear constraints, leading to very fast network weight adaptation. Current knowledge is unsupervisedly extracted using a face-body detector, based on Gaussian p.d.f models. A binary template matching technique is also incorporated, which imposes shape constraints to candidate face regions. Finally the retrained network performs video object segmentation to the new environment. Several experiments on real sequences indicate the promising performance of the proposed adaptive neural network as efficient video object segmentation tool.

Klimis S. Ntalianis, Nikolaos D. Doulamis, Anastasios D. Doulamis, Stefanos D. Kollias
Learning Face Localization Using Hierarchical Recurrent Networks

One of the major parts in human-computer interface applications, such as face recognition and video-telephony, consists in the exact localization of a face in an image.Here, we propose to use hierarchical neural networks with local recurrent connectivity to solve this task, even in presence of complex backgrounds, difficult lighting, and noise. Our network is trained using a database of gray-scale still images and manually determined eye coordinates. It is able to produce reliable and accurate eye coordinates for unknown images by iteratively refining an initial solution.The performance of the proposed approach is evaluated against a large test set. The fast network update allows for real-time operation.

Sven Behnke
A Comparison of Face Detection Algorithms

We present a systematic comparison of the techniques used in some of the most successful neurally inspired face detectors. We report three main findings: First, we present a new analysis of how the SNoW algorithm of Roth, Yang, and Ahuja (200) achieves its high performance. Second, we find that representations based on local receptive fields such as those in Rowley, Baluja, and Kanade consistently provide better performance than full connectivity approaches. Third, we find that ensemble techniques, especially those using active sampling such as AdaBoost and Bootstrap, consistently improve performance.

Ian R. Fasel, Javier R. Movellan

Special Session: Adaptivity in Neural Computation

Frontmatter
Adaptive Model Selection for Digital Linear Classifiers

Adaptive model selection can be defined as the process thanks to which an optimal classifiers h}* is automatically selected from a function class H by using only a given set of examples z. Such a process is particularly critic when the number of examples in z is low, because it is impossible the classical splitting of z in training + test + validation. In this work we show that the joined investigation of two bounds of the prediction error of the classifier can be useful to select h}* by using z for both model selection and training. Our learning algorithm is a simple kernel-based Perceptron that can be easily implemented in a counter-based digital hardware. Experiments on two real world data sets show the validity of the proposed method.

Andrea Boni
Sequential Learning in Feedforward Networks: Proactive and Retroactive Interference Minimization

We tackle the catastrophic interference problem with a formal approach. The problem is divided into two subproblems. The first arises when one tries to introduce some new information in a previously trained network, without distorting the stored information. The second is how to encode a set of patterns so as to preserve them when new information has to be stored. We suggest solutions to both subproblems without using local representations or retraining.

Vicente Ruiz de Angulo, Carme Torras
Automatic Hyperparameter Tuning for Support Vector Machines

This work describes the application of the Maximal Discrepancy (MD) criterion to the process of hyperparameter setting in SVMs and points out the advantages of such an approach over existing theoretical and practical frameworks.The resulting theoretical predictions are compared with a k-fold cross-validation empirical method on some benchmark datasets showing that the MD technique can be used for automatic SVM model selection.

Davide Anguita, Sandro Ridella, Fabio Rivieccio, Rodolfo Zunino
Conjugate Directions for Stochastic Gradient Descent

The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online) setting, using fast Hessian-gradient products to set up low-dimensional Krylov subspaces within individual mini-batches. In our benchmark experiments the resulting online learning algorithms converge orders of magnitude faster than ordinary stochastic gradient descent.

Nicol N. Schraudolph, Thore Graepel

Special Session: Recurrent Neural Systems

Frontmatter
Architectural Bias in Recurrent Neural Networks — Fractal Analysis

We have recently shown that when initiated with “small” weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machines [6,8]. Following [2], we refer to this phenomenon as the architectural bias of RNNs. In this paper we further extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We obtain both lower and upper bounds on various types of fractal dimensions, such as box-counting and Hausdorff dimensions.

Peter Tiňo, Barbara Hammer
Continuous-State Hopfield Dynamics Based on Implicit Numerical Methods

A novel technique is presented that implements continuous-state Hopfield neural networks on a digital computer. Instead of the usual forward Euler rule, the backward method is used. The stability and Lyapunov function of the proposed discrete model are indirectly guaranteed, even for reasonably large step size. This is possible because discretization by implicit numerical methods inherits the stability of the continuous-time model. On the contrary, the forward Euler method requires a very small step size to guarantee convergence to solutions. The presented technique takes advantage of the extensive research on continuous-time stability, as well as recent results in the field of dynamical analysis of numerical methods. Also, standard numerical methods allow for synchronous activation of neurons, thus leading to performance enhancement. Numerical results are presented that illustrate the validity of this approach when applied to optimization problems.

Miguel A. Atencia, Gonzalo Joya, Francisco Sandoval
Time-Scaling in Recurrent Neural Learning

Recurrent Backpropagation schemes for fixed point learning in continuous-time dynamic neural networks can be formalized through a differential-algebraic model, which in turn leads to singularly perturbed training techniques. Such models clarify the relative time-scaling between the network evolution and the adaptation dynamics, and allow for rigorous local convergence proofs. The present contribution addresses some related issues in a discrete-time context: fixed point problems can be analyzed in terms of iterations with different evolution rates, whereas periodic trajectory learning can be reduced to a multiple fixed point learning problem via Poincaré maps.

Ricardo Riaza, Pedro J. Zufiria
Backmatter
Metadaten
Titel
Artificial Neural Networks — ICANN 2002
herausgegeben von
José R. Dorronsoro
Copyright-Jahr
2002
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-46084-8
Print ISBN
978-3-540-44074-1
DOI
https://doi.org/10.1007/3-540-46084-5