Skip to main content

2009 | Buch

Advances in Self-Organizing Maps

7th International Workshop, WSOM 2009, St. Augustine, FL, USA, June 8-10, 2009. Proceedings

herausgegeben von: José C. Príncipe, Risto Miikkulainen

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

th These proceedings contain refereed papers presented at the 7 WSOM held at the Casa Monica Hotel, St. Augustine, Florida, June 8–10, 2009. We designed the wo- shop to serve as a regular forum for researchers in academia and industry who are interested in the exciting field of self-organizing maps (SOM). The program includes excellent examples of the use of SOM in many areas of social sciences, economics, computational biology, engineering, time series analysis, data visualization and c- puter science as well a vibrant set of theoretical papers that keep pushing the envelope of the original SOM. Our deep appreciation is extended to Teuvo Kohonen and Ping Li for the plenary talks and Amaury Lendasse for the organization of the special sessions. Our sincere thanks go to the members of the Technical Committee and other reviewers for their excellent and timely reviews, and above all to the authors whose contributions made this workshop possible. Special thanks go to Julie Veal for her dedication and hard work in coordinating the many details necessary to put together the program and local arrangements. Jose C. Principe Risto Miikkulainen

Inhaltsverzeichnis

Frontmatter
Batch-Learning Self-Organizing Map for Predicting Functions of Poorly-Characterized Proteins Massively Accumulated

As the result of the decoding of large numbers of genome sequences, numerous proteins whose functions cannot be identified by the homology search of amino acid sequences have accumulated and remain of no use to science and industry. Establishment of novel prediction methods for protein function is urgently needed. We previously developed Batch-Learning SOM (BL-SOM) for genome informatics; here, we developed BL-SOM to predict functions of proteins on the basis of similarity in oligopeptide composition of proteins. Oligopeptides are component parts of a protein and involved in formation of its functional motifs and structural parts. Concerning oligopeptide frequencies in 110,000 proteins classified into 2853 function-known COGs (clusters of orthologous groups), BL-SOM could faithfully reproduce the COG classifications, and therefore, proteins whose functions have been unidentified with homology searches could be related to function-known proteins. BL-SOM was applied to predict protein functions of large numbers of proteins obtained from metagenome analyses.

Takashi Abe, Shigehiko Kanaya, Toshimichi Ikemura
Incremental Unsupervised Time Series Analysis Using Merge Growing Neural Gas

We propose Merge Growing Neural Gas (MGNG) as a novel unsupervised growing neural network for time series analysis. MGNG combines the state-of-the-art recursive temporal context of Merge Neural Gas (MNG) with the incremental Growing Neural Gas (GNG) and enables thereby the analysis of unbounded and possibly infinite time series in an online manner. There is no need to define the number of neurons a priori and only constant parameters are used. In order to focus on frequent sequence patterns an entropy maximization strategy is utilized which controls the creation of new neurons. Experimental results demonstrate reduced time complexity compared to MNG while retaining similar accuracy in time series representation.

Andreas Andreakis, Nicolai v. Hoyningen-Huene, Michael Beetz
Clustering Hierarchical Data Using Self-Organizing Map: A Graph-Theoretical Approach

The application of Self-Organizing Map (SOM) to hierarchical data remains an open issue, because such data lack inherent quantitative information. Past studies have suggested binary encoding and Generalizing SOM as techniques that transform hierarchical data into numerical attributes. Based on graph theory, this paper puts forward a novel approach that processes hierarchical data into a numerical representation for SOM-based clustering. The paper validates the proposed graph-theoretical approach via complexity theory and experiments on real-life data. The results suggest that the graph-theoretical approach has lower algorithmic complexity than Generalizing SOM, and can yield SOM having significantly higher cluster validity than binary encoding does. Thus, the graph-theoretical approach can form a data-preprocessing step that extends SOM to the domain of hierarchical data.

Argyris Argyrou
Time Series Clustering for Anomaly Detection Using Competitive Neural Networks

In this paper we evaluate competitive learning algorithms in the task of identifying anomalous patterns in time series data. The methodology consists in computing decision thresholds from the distribution of quantization errors produced by normal training data. These thresholds are then used for classifying incoming data samples as normal/abnormal. For this purpose, we carry out performance comparisons among five competitive neural networks (SOM, Kangas’ Model, TKM, RSOM and Fuzzy ART) on simulated and real-world time series data.

Guilherme A. Barreto, Leonardo Aguayo
Fault Prediction in Aircraft Engines Using Self-Organizing Maps

Aircraft engines are designed to be used during several tens of years. Their maintenance is a challenging and costly task, for obvious security reasons. The goal is to ensure a proper operation of the engines, in all conditions, with a zero probability of failure, while taking into account aging. The fact that the same engine is sometimes used on several aircrafts has to be taken into account too.

The maintenance can be improved if an efficient procedure for the prediction of failures is implemented. The primary source of information on the health of the engines comes from measurement during flights. Several variables such as the core speed, the oil pressure and quantity, the fan speed, etc. are measured, together with environmental variables such as the outside temperature, altitude, aircraft speed, etc.

In this paper, we describe the design of a procedure aiming at visualizing successive data measured on aircraft engines. The data are multi-dimensional measurements on the engines, which are projected on a self-organizing map in order to allow us to follow the trajectories of these data over time. The trajectories consist in a succession of points on the map, each of them corresponding to the two-dimensional projection of the multi-dimensional vector of engine measurements. Analyzing the trajectories aims at visualizing any deviation from a normal behavior, making it possible to anticipate an operation failure.

However rough engine measurements are inappropriate for such an analysis; they are indeed influenced by external conditions, and may in addition vary between engines. In this work, we first process the data by a General Linear Model (GLM), to eliminate the effect of engines and of measured environmental conditions. The residuals are then used as inputs to a Self-Organizing Map for the easy visualization of trajectories.

Marie Cottrell, Patrice Gaubert, Cédric Eloy, Damien François, Geoffroy Hallaux, Jérôme Lacaille, Michel Verleysen
Incremental Figure-Ground Segmentation Using Localized Adaptive Metrics in LVQ

Vector quantization methods are confronted with a model selection problem, namely the number of prototypical feature representatives to model each class. In this paper we present an incremental learning scheme in the context of figure-ground segmentation. In presence of local adaptive metrics and supervised noisy information we use a parallel evaluation scheme combined with a local utility function to organize a learning vector quantization (LVQ) network with an adaptive number of prototypes and verify the capabilities on a real world figure-ground segmentation task.

Alexander Denecke, Heiko Wersing, Jochen J. Steil, Edgar Körner
Application of Supervised Pareto Learning Self Organizing Maps and Its Incremental Learning

We have proposed Supervised Pareto Learning Self Organizing Maps(SP-SOM) based on the concept of Pareto optimality for the integration of multiple vectors and applied SP-SOM to the biometric authentication system which uses multiple behavior characteristics as feature vectors. In this paper, we examine performance of SP-SOM for the generic classification problem using iris data set. Furthermore, we propose the incremental learning algorithm for SP-SOM and examine effectiveness in a classification problem and adaptation ability to the change of the behavior biometric features by time.

Hiroshi Dozono, Shigeomi Hara, Shinsuke Itou, Masanori Nakakuni
Gamma SOM for Temporal Sequence Processing

In this paper, we introduce the Gamma SOM model for temporal sequence processing. The standard SOM is merged with a new context descriptor based on a short term memory structure called Gamma memory. The proposed model allows increasing depth without losing resolution, by adding more contexts. When using a single stage of the Gamma filter, the Merge SOM model is recovered. The temporal quantization error is used as a performance measure. Simulation results are presented using two data sets: Mackey-Glass time series, and Bicup 2006 challenge time series. Gamma SOM surpassed Merge SOM in terms of lower temporal quantization error in these data sets.

Pablo A. Estévez, Rodrigo Hernández
Fuzzy Variant of Affinity Propagation in Comparison to Median Fuzzy c-Means

In this paper we extend the crisp Affinity Propagation (AP) cluster algorithm to a fuzzy variant. AP is a message passing algorithm based on the max-sum-algorithm optimization for factor graphs. Thus it is applicable also for data sets with only dissimilarities known, which may be asymmetric. The proposed Fuzzy Affinity Propagation algorithm (FAP) returns fuzzy assignments to the cluster prototypes based on a probabilistic interpretation of the usual AP. To evaluate the performance of FAP we compare the clustering results of FAP for different experimental and real world problems with solutions obtained by employing Median Fuzzy c-Means (M-FCM) and Fuzzy c-Means (FCM). As measure for cluster agreements we use a fuzzy extension of Cohen’s

κ

based on t-norms.

T. Geweniger, D. Zühlke, B. Hammer, Thomas Villmann
Clustering with Swarm Algorithms Compared to Emergent SOM

Swarm-based methods are promising nature-inspired techniques. A swarm of stochastic agents performs the task of clustering high-dimensional data on a low-dimensional output space. Most swarm methods are derivatives of the Ant Colony Clustering (ACC) approach proposed by Lumer and Faieta. Compared to clustering on Emergent Self-Organizing Maps (ESOM) these methods usually perform poorly in terms of topographic mapping and cluster formation. A unifying representation for ACC methods and Emergent Self-Organizing Maps is presented in this paper. ACC terms are related to corresponding mechanisms of the SOM. This leads to insights on both algorithms. ACC can be considered to be first-degree relatives of the ESOM. This explains benefits and shortcomings of ACC and ESOM. Furthermore, the proposed unification allows to judge whether modifications improve an algorithm’s clustering abilities or not. This is demonstrated using a set of critical clustering problems.

Lutz Herrmann, Alfred Ultsch
Cartograms, Self-Organizing Maps, and Magnification Control

This paper presents a simple way to compensate the magnification effect of Self-Organizing Maps (SOM) when creating cartograms using Carto-SOM. It starts with a brief explanation of what a cartogram is, how it can be used, and what sort of metrics can be used to assess its quality. The methodology for creating a cartogram with a SOM is then presented together with an explanation of how the magnification effect can be compensated in this case by pre-processing the data. Examples of cartograms produced with this method are given, concluding that Self-Organizing Maps can be used to produce high quality cartograms, even using only standard software implemen-tations of SOM.

Roberto Henriques, Fernando Bação, Victor Lobo
Concept Mining with Self-Organizing Maps for the Semantic Web

In this paper, we discuss problems related to the basic Semantic Web methodologies that are based on predicate logic and related formalisms. We discuss complementary and alternative approaches. In particular, we suggest how the Self-Organizing Map can be a basis for making the Semantic Web more semantic.

Timo Honkela, Matti Pöllä
ViSOM for Dimensionality Reduction in Face Recognition

The self-organizing map (SOM) is a classical neural network method for dimensionality reduction and data visualization. Visualization induced SOM (ViSOM) and growing ViSOM (gViSOM) are two recently proposed variants for a more faithful, metric-based and direct data representation. They learn local quantitative distances of data by regularizing the inter-neuron contraction force while capturing the topology and minimizing the quantization error. In this paper we first review related dimension reduction methods, and then examine their capabilities for face recognition. The experiments were conducted on the ORL face database and the results show that both ViSOM and gViSOM significantly outperform SOM, PCA and related methods in terms of recognition error rate. In the training with five faces, the error rate of gViSOM dimension reduction followed by a soft

k

-NN classifier reaches as low as 2.1%, making ViSOM an efficient approach for data representation and dimensionality reduction.

Weilin Huang, Hujun Yin
Early Recognition of Gesture Patterns Using Sparse Code of Self-Organizing Map

We propose a new gesture recognition method which is called “early recognition”. Early recognition is a method to recognize sequential patterns at their beginning parts. Therefore, in the case of gesture recognition, we can get a recognition result of human gestures before the gestures have finished. We realize early recognition by using sparse codes of Self-Organizing Map.

Manabu Kawashima, Atsushi Shimada, Rin-ichiro Taniguchi
Bag-of-Features Codebook Generation by Self-Organisation

Bag of features is a well established technique for the visual categorisation of objects, categories of objects and textures. One of the most important part of this technique is codebook generation since its within-class and between-class discrimination power is the main factor in the categorisation accuracy. A codebook is generated from regions of interest extracted automatically from a set of labeled (supervised/semi-supervised) or unlabeled (unsupervised) images. A standard tool for the codebook generation is the c-means clustering algorithm, and the state-of-the-art results have been reported using generation schemes based on the c-means. In this work, we challenge this mainstream approach by demonstrating how the competitive learning principle in the self-organising map (SOM) is able to provide similar and often superior results to the c-means. Therefore, we claim that exploiting the self-organisation principle is an alternative research direction to the mainstream research in visual object categorisation and its importance for the ultimate challenge, unsupervised visual object categorisation, needs to be investigated.

Teemu Kinnunen, Joni-Kristian Kamarainen, Lasse Lensu, Heikki Kälviäinen
On the Quantization Error in SOM vs. VQ: A Critical and Systematic Study

The self-organizing map (SOM) is related to the classical vector quantization (VQ). Like in the VQ, the SOM represents a distribution of input data vectors using a finite set of models. In both methods, the quantization error (QE) of an input vector can be expressed, e.g., as the Euclidean norm of the difference of the input vector and the best-matching model. Since the models are usually optimized in the VQ so that the sum of the squared QEs is minimized for the given set of training vectors, a common notion is that it will be impossible to find models that produce a smaller rms QE. Therefore it has come as a surprise that in some cases the rms QE of a SOM can be smaller than that of a VQ with the same number of models and the same input data. This effect may manifest itself if the number of training vectors per model is on the order of small integers and the testing is made with an independent set of test vectors. An explanation seems to ensue from statistics. Each model vector in the VQ is determined as the average of those training vectors that are mapped into the same Voronoi domain as the model vector. On the contrary, each model vector of the SOM is determined as a weighted average of all of those training vectors that are mapped into the “topological” neighborhood around the corresponding model. The number of training vectors mapped into the neighborhood of a SOM model is generally much larger than that mapped into a Voronoi domain around a model in the VQ. Since the SOM model vectors are then determined with a significantly higher statistical accuracy, the Voronoi domains of the SOM are significantly more regular, and the resulting rms QE may then be smaller than in the VQ. However, the effective dimensionality of the vectors must also be sufficiently high.

Teuvo Kohonen, Ilari T. Nieminen, Timo Honkela
Approaching the Time Dependent Cocktail Party Problem with Online Sparse Coding Neural Gas

We show how the “Online Sparse Coding Neural Gas” algorithm can be applied to a more realistic model of the “Cocktail Party Problem”. We consider a setting where more sources than observations are given and additive noise is present. Furthermore, we make the model even more realistic, by allowing the mixing matrix to change slowly over time. We also process the data in an online pattern-by-pattern way where each observation is presented only once to the learning algorithm. The sources are estimated immediately from the observations. In order to evaluate the influence of the change rate of the time dependent mixing matrix and the signal-to-noise ratio on the reconstruction performance with respect to the underlying sources and the true mixing matrix, we use artificial data with known ground truth.

Kai Labusch, Erhardt Barth, Thomas Martinetz
Career-Path Analysis Using Optimal Matching and Self-Organizing Maps

This paper is devoted to the analysis of career paths and employability. The state-of-the-art on this topic is rather poor in methodologies. Some authors propose distances well adapted to the data, but are limiting their analysis to hierarchical clustering. Other authors apply sophisticated methods, but only after paying the price of transforming the categorical data into continuous, via a factorial analysis. The latter approach has an important drawback since it makes a linear assumption on the data. We propose a new methodology, inspired from biology and adapted to career paths, combining optimal matching and self-organizing maps. A complete study on real-life data will illustrate our proposal.

Sébastien Massoni, Madalina Olteanu, Patrick Rousset
Network-Structured Particle Swarm Optimizer with Various Topology and Its Behaviors

This study proposes Network-Structured Particle Swarm Optimizer (NS-PSO) with various neighborhood topology. The proposed PSO has the various network topology as rectangular, hexagonal, cylinder and toroidal. We apply NS-PSO with various topology to optimization problems. We investigate their behaviors and evaluate what kind of topology would be the most appropriate for each function.

Haruna Matsushita, Yoshifumi Nishio
Representing Semantic Graphs in a Self-Organizing Map

A long-standing problem in the field of connectionist language processing has been how to represent detailed linguistic structure. Approaches have ranged from the encoding of syntactic trees in

Raam

to the use of a mechanism to query meanings in a “gestalt layer”. In this article, a technique called semantic self-organization is presented that allows for the optimal allocation and explicit representation of semantic dependency graphs on a

Som

-based grid. This technique has been successfully used in a connectionist natural language processing architecture called

InSomNet

to scale up the subsymbolic approach to represent sentences in the

LinGO

Redwoods HPSG Treebank drawn from the VerbMobil Project and annotated with rich semantic information.

InSomNet

was also shown to retain the cognitively plausible behavior detailed in psycholinguistics research. Consequently, semantic self-organization holds considerable promise as a basis for real-world natural language understanding systems that mimic human linguistic performance.

Marshall R. Mayberry, Risto Miikkulainen
Analytic Comparison of Self-Organising Maps

SOMs have proven to be a very powerful tool for data analysis. However, comparing multiple SOMs trained on the same data set using different parameters or initialisations is still a difficult task. In most cases it is performed only via visual inspection or by utilising one of a range of quality measures to compare vector quantisation or topology preservation characteristics of the maps. Yet, comparing SOMs systematically is both necessary as well as a powerful tool to further analyse data: necessary, because it may help to pick the most suitable SOM out of different training runs; a powerful tool because it allows analysing mapping stabilities across a range of parameter settings. In this paper we present an analytic approach to compare multiple SOMs trained on the same data set. Analysis of output space mapping, supported by a set of visualisations, reveals data co-locations and shifts on pairs of SOMs, considering both different neighbourhood sizes at source and target maps. A similar concept of mutual distances and relationships can be analysed at a cluster level. Finally, Comparisons aggregated automatically across several SOMs are strong indicators for strength and stability of mappings.

Rudolf Mayer, Robert Neumayer, Doris Baum, Andreas Rauber
Modeling the Bilingual Lexicon of an Individual Subject

Lexicon is a central component in any language processing system, whether human or artificial. Recent empirical evidence suggests that a multilingual lexicon consists of a single component representing word meanings, and separate component for the symbols in each language. These components can be modeled as self-organizing maps, with associative connections between them implementing comprehension and production. Computational experiments in this paper show that such a model can trained to match the proficiency and age of acquisition of particular bilingual individuals. In the future, it may be possible to use such models to predict the effect of rehabilitation of bilingual aphasia, resulting in more effective treatments.

Risto Miikkulainen, Swathi Kiran
Self-Organizing Maps with Non-cooperative Strategies (SOM-NC)

The training scheme in self-organizing maps consists of two phases: i) competition, in which all units intend to become the best matching unit (BMU), and ii) cooperation, in which the BMU allows its neighbor units to adapt their weight vector. In order to study the relevance of cooperation, we present a model in which units do not necessarily cooperate with their neighbors, but follow some strategy. The strategy concept is inherited from game theory, and it establishes whether the BMU will allow or not their neighbors to learn the input stimulus. Different strategies are studied, including unconditional cooperation as in the original model, unconditional defection, and several history-based schemes. Each unit is allowed to change its strategy in accordance with some heuristics. We give evidence of the relevance of non-permanent cooperators units in order to achieve good maps, and we show that self-organization is possible when cooperation is not a constraint.

Antonio Neme, Sergio Hernández, Omar Neme, Leticia Hernández
Analysis of Parliamentary Election Results and Socio-Economic Situation Using Self-Organizing Map

The complex phenomena of political science are typically studied using qualitative approach, potentially supported by hypothesis-driven statistical analysis of some numerical data. In this article, we present a complementary method based on data mining and specifically on the use of the self-organizing map. The idea in data mining is to explore the given data without predetermined hypotheses. As a case study, we explore the relationship between parliamentary election results and socio-economic situation in Finland between 1954 and 2003.

Pyry Niemelä, Timo Honkela
Line Image Classification by NG×SOM: Application to Handwritten Character Recognition

A method for generating a self-organizing map of line images is proposed. In the proposed method, called the NG×SOM, a set of data distributions is represented by a product space organized by a set of neural gas networks (NGs) and a self-organizing map (SOM). In this paper, it is assumed that the line images dealt with by the NG×SOM have the same, yet unknown, topology. Thus the task of the NG×SOM is to generate a map of line images with the same topology, in which the images are continuously and naturally morphed from one into another. We applied the NG×SOM to a handwritten character recognition task. The results obtained show that this method is effective, particularly when the number of training data is small.

Makoto Otani, Kouichi Gunya, Tetsuo Furukawa
Self-Organization of Tactile Receptive Fields: Exploring Their Textural Origin and Their Representational Properties

In our earlier work, we found that feature space induced by tactile receptive fields (TRFs) are better than that by visual receptive fields (VRFs) in texture boundary detection tasks. This suggests that TRFs could be intimately associated with texture-like input. In this paper, we investigate how TRFs can develop in a cortical learning context. Our main hypothesis is that TRFs can be self-organized using the same cortical development mechanism found in the visual cortex, simply by exposing it to texture-like inputs (as opposed to natural-scene-like inputs). To test our hypothesis, we used the LISSOM model of visual cortical development. Our main results show that texture-like inputs lead to the self-organization of TRFs while natural-scene-like inputs lead to VRFs. These results suggest that TRFs can better represent texture than VRFs. We further analyzed the effectiveness of TRFs in representing texture, using kernel Fisher discriminant (KFD) and the results, along with texture classification performance, confirm that this is indeed the case. We expect these results to help us better understand the nature of texture, as a fundamentally tactile property.

Choonseog Park, Heeyoul Choi, Yoonsuck Choe
Visualization by Linear Projections as Information Retrieval

We apply a recent formalization of

visualization as information retrieval

to linear projections. We introduce a method that optimizes a linear projection for an information retrieval task: retrieving neighbors of input samples based on their low-dimensional visualization coordinates only. The simple linear projection makes the method easy to interpret, while the visualization task is made well-defined by the novel information retrieval criterion. The method has a further advantage: it projects input features, but the input neighborhoods it preserves can be given separately from the input features, e.g. by external data of sample similarities. Thus the visualization can reveal the relationship between data features and complicated data similarities. We further extend the method to kernel-based projections.

Jaakko Peltonen
Analyzing Domestic Violence with Topographic Maps: A Comparative Study

Topographic maps are an appealing exploratory instrument for discovering new knowledge from databases. During the recent years, several variations on the Self Organizing Maps (SOM) were introduced in the literature. In this paper, the toroidal Emergent SOM tool and the spherical SOM are used to analyze a text corpus consisting of police reports of all violent incidents that occurred during the first quarter of 2006 in the police region Amsterdam-Amstelland (The Netherlands). It is demonstrated that spherical topographic maps provide a powerful instrument for analyzing this dataset. In addition, the performance of the toroidal Emergent SOM is compared to that of the spherical SOM, and it turned out to be superior to that of an ordinary classifier, applied directly to the data.

Jonas Poelmans, Paul Elzinga, Stijn Viaene, Guido Dedene, Marc M. Van Hulle
On the Finding Process of Volcano-Domain Ontology Components Using Self-Organizing Maps

Monitoring volcanic activity is a task that requires people from a number of disciplines. Infrastructure, on the other hand , has been built all over the world to keep track of these living earth entities, ie volcanoes. In this paper we present an approach that merges a number of computational tools and that may be incorporated to existing ones to predict important volcanic events. It mainly consists of applying artificial learning, ontology, and software agents for the analysis, organization, and use of volcanic-domain data for the communities of people, living nearby volcanoes, benefit. This proposal allows domain experts to have a view of the knowledge contained in and that can be extracted from the Volcanic-Domain Digital Archives (VDDA). Specific-domain knowledge components with further processing, and by embedding them into the digital archive itself, can be shared with and manipulated by software agents. In this first study, we deal with the issue of applying Self-Organizing Maps (SOM), to volcano-domain signals originated by the activity of the Volcano of Colima, Mexico. By applying this algorithm we have generated clusters of volcanic activity and can readily identify families of important events.

J. R. G. Pulido, M. A. Aréchiga, E. M. R. Michel, G. Reyes, V. Zobin
Elimination of Useless Neurons in Incremental Learnable Self-Organizing Map

We propose a method to eliminate unnecessary neurons in Variable-Density Self-Organizing Map. We have defined an energy function which denotes the error of the map, and optimize the energy function by using graph cut algorithm. We conducted experiments to investigate the effectiveness of our approach.

Atsushi Shimada, Rin-ichiro Taniguchi
Hierarchical PCA Using Tree-SOM for the Identification of Bacteria

In this paper we present an extended version of

Evolving Trees

using Oja’s rule. Evolving Trees are extensions of

Self-Organizing Maps

developed for hierarchical classification systems. Therefore they are well suited for taxonomic problems like the identification of bacteria. The paper focus on clustering and visualization of bacteria measurements. A modified variant of the Evolving Tree is developed and applied to obtain a hierarchical clustering. The method provides an inherent PCA analysis which is analyzed in combination with the tree based visualization. The obtained loadings support insights in the classification decision and can be used to identify features which are relevant for the cluster separation.

Stephan Simmuteit, Frank-Michael Schleif, Thomas Villmann, Markus Kostrzewa
Optimal Combination of SOM Search in Best-Matching Units and Map Neighborhood

The distribution of a class of objects, such as images depicting a specific topic, can be studied by observing the best-matching units (BMUs) of the objects’ feature vectors on a Self-Organizing Map (SOM). When the BMU “hits” on the map are summed up, the class distribution may be seen as a two-dimensional histogram or discrete probability density. Due to the SOM’s topology preserving property, one is motivated to smooth the value field and spread out the values spatially to neighboring units, from where one may expect to find further similar objects. In this paper we study the impact of using more map units than just the single BMU of each feature vector in modeling the class distribution. We demonstrate that by varying the number of units selected in this way and varying the width of the spatial convolution one can find an optimal combination which maximizes the class detection performance.

Mats Sjöberg, Jorma Laaksonen
Sparse Linear Combination of SOMs for Data Imputation: Application to Financial Database

This paper presents a new methodology for missing value imputation in a database. The methodology combines the outputs of several Self-Organizing Maps in order to obtain an accurate filling for the missing values. The maps are combined using MultiResponse Sparse Regression and the Hannan-Quinn Information Criterion. The new combination methodology removes the need for any lengthy cross-validation procedure, thus speeding up the computation significantly. Furthermore, the accuracy of the filling is improved, as demonstrated in the experiments.

Antti Sorjamaa, Francesco Corona, Yoan Miche, Paul Merlin, Bertrand Maillet, Eric Séverin, Amaury Lendasse
Towards Semi-supervised Manifold Learning: UKR with Structural Hints

We explore generic mechanisms to introduce

structural hints

into the method of Unsupervised Kernel Regression (UKR) in order to learn representations of data sequences in a semi-supervised way. These new extensions are targeted at representing a dextrous manipulation task. We thus evaluate the effectiveness of the proposed mechanisms on appropriate toy data that mimic the characteristics of the aimed manipulation task and thereby provide means for a systematic evaluation.

Jan Steffen, Stefan Klanke, Sethu Vijayakumar, Helge Ritter
Construction of a General Physical Condition Judgment System Using Acceleration Plethysmogram Pulse-Wave Analysis

Among the popular lifestyle-related diseases are smoking, overweight and stress. A daily health check is important because there is no clear objective symptom for these diseases. We developed diagnotic software which shows the state of the blood vessels using a Basic SOM model, and performs synthetic plethysmogram analysis of 4 components using the map location (the state of the blood vessel, vascularity), looseness, pulse/minute, and pulse stability.

Heizo Tokutaka, Yoshio Maniwa, Eikou Gonda, Masashi Yamamoto, Toshiyuki Kakihara, Masahumi Kurata, Kikuo Fujimura, Li Shigang, Masaaki Ohkita
Top-Down Control of Learning in Biological Self-Organizing Maps

This paper discusses biological aspects of self-organising maps (SOMs) which includes a brief review of neurophysiological findings and classical models of neurophysiological SOMs. We then discuss some simulation studies on the role of topographic map representation for training mapping networks and on top-down control of map plasticity.

Thomas Trappenberg, Pitoyo Hartono, Douglas Rasmusson
Functional Principal Component Learning Using Oja’s Method and Sobolev Norms

In this paper we present a method for functional principal component analysis based on the Oja-learning and neural gas vector quantizer. However, instead of the Euclidean inner product the Sobolev counterpart is applied, which takes the derivatives of the functional data into account and, therefore, uses information contained in the functional shape of the data into account. We investigate the theoretical foundations of the algorithm for convergence and stability and give exemplary applications.

Thomas Villmann, Barbara Hammer
A Computational Framework for Nonlinear Dimensionality Reduction and Clustering

We introduce the Exploration Machine (Exploratory Observation Machine, XOM) as a novel versatile instrument for scientific data analysis and knowledge discovery. XOM systematically inverts structural and functional components of topology-preserving mappings. In contrast to conventional approaches known from the literature, this novel computational framework for self-organization does not require to incorporate additional graphical display or coloring techniques, or to modify topology-preserving mapping algorithms by additional regularization in order to recover the underlying cluster structure of inhomogeneously distributed input data. Thus, XOM can be seen as an approach to bridge the gap between nonlinear embedding and classical topology-preserving feature mapping. At the same time, XOM results in tremendous computational savings when compared to conventional topology-preserving mapping, thus allowing for direct structure-preserving visualization of large data collections without prior data reduction.

Axel Wismüller
The Exploration Machine – A Novel Method for Data Visualization

We present a novel method for structure-preserving dimensionality reduction. The Exploration Machine (Exploratory Observation Machine, XOM) computes graphical representations of high-dimensional observations by a strategy of self-organized model adaptation. Although simple and computationally efficient, XOM enjoys a surprising flexibility to simultaneously contribute to several different domains of advanced machine learning, scientific data analysis, and visualization, such as structure-preserving dimensionality reduction and data clustering.

Axel Wismüller
Generalized Self-Organizing Mixture Autoregressive Model

The self-organizing mixture autoregressive (SOMAR) model regards a time series as a mixture of regressive processes. A self-organizing algorithm is used with the LMS algorithm to learn the parameters of these regressive models. The self-organizing map is used to simplify the mixture as a winner-take-all selection of local models, combined with an autocorrelation coefficient based measure as the similarity measure for identifying correct local models. The SOMAR has been shown previously being able to uncover underlying autoregressive processes from a mixture. This paper proposes a generalized SOMAR that fully considers the mixing mechanism and individual model variances that make modeling and prediction more accurate for non-stationary time series. Experiments on both benchmark and financial time series are presented. The results demonstrate the superiority of the proposed method over other time-series modeling techniques on a range of performance measures.

Hujun Yin, He Ni
An SOM-Hybrid Supervised Model for the Prediction of Underlying Physical Parameters from Near-Infrared Planetary Spectra

Near-Infrared reflectance spectra of planets can be used to infer surface parameters, sometimes with relevance to recent geologic history. Accurate prediction of parameters (such as composition, temperature, grain size, crystalline state, and dilution of one species within another) is often difficult because parameters manifest subtle but significant details in noisy spectral observations, because diverse parameters may produce similar spectral signatures, and because of the high dimensionality of the feature vectors (spectra). These challenges are often unmet by traditional inference methods. We retrieve two underlying causes of the spectral shapes, temperature and grain size, with an SOM-hybrid supervised neural prediction model. We achieve 83.0±2.7% and 100.0±0.0% prediction accuracy for temperature and grain size, respectively. The key to these high accuracies is the exploitation of an interesting antagonistic relationship between the nature of the physical parameters, and the learning mode of the SOM in the neural model.

Lili Zhang, Erzsébet Merényi, William M. Grundy, Eliot F. Young
Backmatter
Metadaten
Titel
Advances in Self-Organizing Maps
herausgegeben von
José C. Príncipe
Risto Miikkulainen
Copyright-Jahr
2009
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-02397-2
Print ISBN
978-3-642-02396-5
DOI
https://doi.org/10.1007/978-3-642-02397-2