nach oben

2011 | Buch

Kapitel lesen Erstes Kapitel lesen

Adaptive and Natural Computing Algorithms

10th International Conference, ICANNGA 2011, Ljubljana, Slovenia, April 14-16, 2011, Proceedings, Part II

herausgegeben von: Andrej Dobnikar, Uroš Lotrič, Branko Šter

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The two-volume set LNCS 6593 and 6594 constitutes the refereed proceedings of the 10th International Conference on Adaptive and Natural Computing Algorithms, ICANNGA 2010, held in Ljubljana, Slovenia, in April 2010. The 83 revised full papers presented were carefully reviewed and selected from a total of 144 submissions. The second volume includes 41 papers organized in topical sections on pattern recognition and learning, soft computing, systems theory, support vector machines, and bioinformatics.

Inhaltsverzeichnis

Frontmatter

Pattern Recognition and Learning

Asymmetric k-Means Algorithm

In this paper, an asymmetric version of the

-means clustering algorithm is proposed. The asymmetry arises caused by the use of asymmetric dissimilarities in the

-means algorithm. Application of asymmetric measures of dissimilarity is motivated with a basic nature of the

-means algorithm, which uses dissimilarities in an asymmetric manner. Clusters centroids are treated as the dominance points governing the asymmetric relationships in the entire cluster analysis. The results of experimental study on the real data have shown the superiority of asymmetric dissimilarities employed for the

-means method over their symmetric counterparts.

Dominik Olszewski

Gravitational Clustering of the Self-Organizing Map

Data clustering is the fundamental data analysis method, widely used for solving problems in the field of machine learning. Numerous clustering algorithms exist, based on various theories and approaches, one of them being the well-known Kohonen’s self-organizing map (SOM). Unfortunately, after training the SOM there is no explicitly obtained information about clusters in the underlying data, so another technique for grouping SOM units has to be applied afterwards. In this paper, a contribution towards clustering of the SOM is presented, employing principles of Gravitational Law. On the first level of the proposed algorithm, SOM is trained on the input data and prototypes are extracted. On the second level, each prototype acts as a unit-mass point in a feature space, in which presence of gravitational force is simulated, exploiting information about connectivity gained on the first level. The proposed approach is capable of discovering complex cluster shapes, not only limited to the spherical ones, and is able to automatically determine the number of clusters. Experiments with synthetic and real data are conducted to show performance of the presented method in comparison with other clustering techniques.

Nejc Ilc, Andrej Dobnikar

A General Method for Visualizing and Explaining Black-Box Regression Models

We propose a method for explaining regression models and their predictions for individual instances. The method successfully reveals how individual features influence the model and can be used with any type of regression model in a uniform way. We used different types of models and data sets to demonstrate that the method is a useful tool for explaining, comparing, and identifying errors in regression models.

Erik Štrumbelj, Igor Kononenko

An Experimental Study on Electrical Signature Identification of Non-Intrusive Load Monitoring (NILM) Systems

Electrical load disambiguation for end-use recognition in the residential sector has become an area of study of its own right. Several works have shown that individual loads can be detected (and separated) from sampling of the power at a single point (e.g. the electrical service entrance for the house) using a non-intrusive load monitoring (NILM) approach. This work presents the development of an algorithm for electrical feature extraction and pattern recognition, capable of determining the individual consumption of each device from the aggregate electric signal of the home. Namely, the idea consists of analyzing the electrical signal and identifying the unique patterns that occur whenever a device is turned on or off by applying signal processing techniques. We further describe our technique for distinguishing loads by matching different signal parameters (step-changes in active and reactive powers and power factor) to known patterns. Computational experiments show the effectiveness of the proposed approach.

Marisa B. Figueiredo, Ana de Almeida, Bernardete Ribeiro

Evaluation of a Resource Allocating Network with Long Term Memory Using GPU

Incremental learning has recently received broad attention in many applications of pattern recognition and data mining. With many typical incremental learning situations in the real world where a fast response to changing data is necessary, developing a parallel implementation (in fast processing units) will give great impact to many applications. Current research on incremental learning methods employs a modified version of a resource allocating network (RAN) which is one variation of a radial basis function network (RBFN). This paper evaluates the impact of a Graphics Processing Units (GPU) based implementation of a RAN network incorporating Long Term Memory (LTM) [4]. The incremental learning algorithm is compared with the batch RBF approach in terms of accuracy and computational cost, both in sequential and GPU implementations. The UCI machine learning benchmark datasets and a real world problem of multimedia forgery detection were considered in the experiments. The preliminary evaluation shows that although the creation of the model is faster with the RBF algorithm, the RAN-LTM can be useful in environments with the need of fast changing models and high-dimensional data.

Bernardete Ribeiro, Ricardo Quintas, Noel Lopes

Gabor Descriptors for Aerial Image Classification

The amount of remote sensed imagery that has become available by far surpasses the possibility of manual analysis. One of the most important tasks in the analysis of remote sensed images is land use classification. This task can be recast as semantic classification of remote sensed images. In this paper we evaluate classifiers for semantic classification of aerial images. The evaluated classifiers are based on Gabor and Gist descriptors which have been long established in image classification tasks. We use support vector machines and propose a kernel well suited for using with Gabor descriptors. These simple classifiers achieve correct classification rate of about 90% on two datasets. From these results follows that, in aerial image classification, simple classifiers give results comparable to more complex approaches, and the pursuit for more advanced solutions should continue having this in mind.

Vladimir Risojević, Snježana Momić, Zdenka Babić

Text Representation in Multi-label Classification: Two New Input Representations

Automatic text classification is the task of assigning unseen documents to a predefined set of classes. Text representation for classification purposes has been traditionally approached using a vector space model due to its simplicity and good performance. On the other hand, multi-label automatic text classification has been typically addressed either by transforming the problem under study to apply binary techniques or by adapting binary algorithms to work with multiple labels. In this paper we present two new representations for text documents based on label-dependent term-weighting for multi-label classification. We focus on modifying the input. Performance was tested with a well-known dataset and compared to alternative techniques. Experimental results based on Hamming loss analysis show an improvement against alternative approaches.

Rodrigo Alfaro, Héctor Allende

Fraud Detection in Telecommunications Using Kullback-Leibler Divergence and Latent Dirichlet Allocation

In this paper, a method for telecommunications fraud detection is proposed. The method is based on the user profiling by employing the Latent Dirichlet Allocation (LDA). The detection of fraudulent behavior is achieved with a threshold-type classification algorithm, allocating the telecommunication accounts into one of two classes: fraudulent account and non-fraudulent account. The accounts are classified with use of the Kullback-Leibler divergence (KL-divergence). Therefore, we also introduce four methods for approximating the KL-divergence between two LDAs. Finally, the results of experimental study on KL-divergence approximation and fraud detection in telecommunications are reported.

Dominik Olszewski

Classification of EEG in A Steady State Visual Evoked Potential Based Brain Computer Interface Experiment

In this paper, electroencephalogram (EEG) signals of 20 subjects are classified in a steady state visual evoked potential (SSVEP) based brain computer interface (BCI) system by using 4 different stimulation frequencies in a program created by Visual C#. After applying proper pre-processing methods, power spectral density (PSD) based features are extracted around first and second harmonics of the stimulation frequencies. Average classification performance obtained from 20 subjects in 4-class classification is 83.62% with Nearest Mean Classifier (NMC). Results for 5-class classification, EEG segment size and gender differences are also analyzed in a detailed manner. The classification method is simple and very suitable for real-time experiments.

Zafer İşcan, Özen Özkaya, Zümray Dokur

Fast Projection Pursuit Based on Quality of Projected Clusters

Projection pursuit index measuring quality of projected clusters (QPC) introduced recently optimizes projection directions by minimizing leave-one-out error searching for pure localized clusters. QPC index has been used in constructive neural networks to discover non-local clusters in high-dimensional multi-class data, reduce dimensionality, aggregate features, visualize and classify data. However, for

training instances such optimization requires

(

) calculations. Fast approximate version of QPC introduced here obtains results of similar quality with

(

) effort, as illustrated in a number of classification and data visualization problems.

Marek Grochowski, Włodzisław Duch

A New N-gram Feature Extraction-Selection Method for Malicious Code

N-grams are the basic features commonly used in sequence-based malicious code detection methods in computer virology research. The empirical results from previous works suggest that, while short length n-grams are easier to extract, the characteristics of the underlying executables are better represented in lengthier n-grams. However, by increasing the length of an n-gram, the feature space grows in an exponential manner and much space and computational resources are demanded. And therefore, feature selection has turned to be the most challenging step in establishing an accurate detection system based on byte n-grams. In this paper we propose an efficient feature extraction method where in order to gain more information; both adjacent and non-adjacent bi-grams are used. Additionally, we present a novel boosting feature selection method based on genetic algorithm. Our experimental results indicate that the proposed detection system detects virus programs far more accurately than the best earlier known methods.

Hamid Parvin, Behrouz Minaei, Hossein Karshenas, Akram Beigi

A Robust Learning Model for Dealing with Missing Values in Many-Core Architectures

Most of the classification algorithms (e.g. support vector machines, neural networks) cannot directly handle Missing Values (MV). A common practice is to rely on data pre-processing techniques by using imputation or simply by removing instances and/or features containing MV. This seems inadequate for various reasons: the resulting models do not preserve the uncertainty, these techniques might inject inaccurate values into the learning process, the resulting models are unable to deal with faulty sensors and data in real-world problems is often incomplete. In this paper we look at the Missing Values Problem (MVP) by extending our recently proposed Neural Selective Input Model (NSIM) first, to a novel multi-core architecture implementation and, second, by validating our method in a real-world financial application. The NSIM encompasses different transparent and bound (conceptual) models, according to the multiple combinations of missing attributes. The proposed NSIM is applied to bankruptcy prediction of (healthy and distressed) French companies, yielding much better performance than previous approaches using pre-processing techniques. Moreover, the Graphics Processing Unit (GPU) implementation reduces drastically the time spent in the learning phase, making the NSIM an excellent choice for dealing with the MVP.

Noel Lopes, Bernardete Ribeiro

A Model of Saliency-Based Selective Attention for Machine Vision Inspection Application

A machine vision inspection model of surface defects, inspired by the methodologies of neuroanatomy and psychology, is investigated. Firstly, the features extracted from defect images are combined into a saliency map. The bottom-up attention mechanism then obtains ‘‘what’’ and ‘‘where’’ information. Finally, the Markov model is used to classify the types of the defects. Experimental results demonstrate the feasibility and effectiveness of the proposed model with 94.40% probability of accurately detecting of the existence of cropper strips defects.

Xiao-Feng Ding, Li-Zhong Xu, Xue-Wu Zhang, Fang Gong, Ai-Ye Shi, Hui-Bin Wang

Grapheme-Phoneme Translator for Brazilian Portuguese

This work presents an application for grapheme-phoneme translation for Portuguese language texts based on adaptive automata. The application has a module for grapheme-phoneme translation of words as its core, and input texts are transformed into sequences of words (including numbers, acronyms, etc) that are used as input for the word translation module. The word translation module separates words into sequences of tokens and analyzes their behavior considering stress and influences from adjacent tokens. The paper begins with an overview of the word translation method based on adaptive automata, presents the application for text translation and ends with results of translation tests using texts from Brazilian newspapers.

Danilo Picagli Shibata, Ricardo Luis de Azevedo da Rocha

Soft Computing

Improvement of Inventory Control under Parametric Uncertainty and Constraints

The aim of the present paper is to show how the statistical inference equivalence principle (SIEP), the idea of which belongs to the authors, may be employed in the particular case of finding the effective statistical decisions for the multi-product inventory problems with constraints. To our knowledge, no analytical or efficient numerical method for finding the optimal policies under parametric uncertainty for the multi-product inventory problems with constraints has been reported in the literature. Using the (equivalent) predictive distributions, this paper represents an extension of analytical results obtained for unconstrained optimization under parametric uncertainty to the case of constrained optimization. A numerical example is given.

Nicholas Nechval, Konstantin Nechval, Maris Purgailis, Uldis Rozevskis

Modified Jakubowski Shape Transducer for Detecting Osteophytes and Erosions in Finger Joints

In this paper, a syntactic method of pattern recognition is applied to hand radiographs interpretation, in order to recognize erosions and osteophytes in the finger joints. It is shown that, the classical Jakubowski transducer does not distinguish contours of healthy bones from contours of affected bones. Therefore, the modifications of the transducer are introduced. It is demonstrated, that the modified transducer correctly recognizes the classes of bone shapes obtained based on the medical classification: healthy bone class, erosion bone class and osteophyte bone class.

Marzena Bielecka, Andrzej Bielecki, Mariusz Korkosz, Marek Skomorowski, Wadim Wojciechowski, Bartosz Zieliński

Using CMAC for Mobile Robot Motion Control

Cerebellar Model Articulation Controller (CMAC) has some attractive features: fast learning capability and the possibility of effi- cient digital hardware implementation. These features makes it a good choice for different control applications, like the one presented in this paper. The problem is to navigate a mobile robot (e.g a car) from an initial state to a fixed goal state. The approach applied is backpropagation through time (BPTT). Besides the attractive features of CMAC it has a serious drawback: its memory complexity may be very large. To reduce memory requirement different variants of CMACs were developed. In this paper several variants are used for solving the navigation problem to see if using a network with reduced memory size can solve the problem efficiently. Only those solutions are described in detail that solve the problem in an acceptable level. All of these variants of the CMAC require higher-order basis functions, as for BPTT continuous input-output mapping of the applied neural network is required.

Kristóf Gáti, Gábor Horváth

Optimizing the Robustness of Scale-Free Networks with Simulated Annealing

We study the robustness of Barabási-Albert scale-free networks with respect to intentional attacks to highly connected nodes. Using the simulated annealing optimization heuristic, we rewire the networks such that their robustness to network fragmentation is improved but without changing neither the degree distribution nor the connectivity of single nodes. We show that simulated annealing improves on the results previously obtained with a simple hill-climbing procedure. We also introduce a local move operator in order to facilitate actual rewiring and show numerically that the results are almost equally good.

Pierre Buesser, Fabio Daolio, Marco Tomassini

Numerically Efficient Analytical MPC Algorithm Based on Fuzzy Hammerstein Models

Numerically efficient analytical MPC (Model Predictive Control) algorithm based on fuzzy Hammerstein models is proposed in the paper. Thanks to the form of the model the prediction can be described by analytical formulas and the proposed algorithm is numerically efficient. It is shown that thanks to a clever tuning of the controller most of calculations needed to derive the control value can be performed off–line. Thus, the proposed algorithm has the advantage reserved so far for analytical MPC algorithms based on linear models. At the same time, the algorithm offers practically the same performance as the MPC algorithm in which a nonlinear optimization problem must be solved at each iteration. The efficiency of the algorithm is demonstrated in the control system of a nonlinear control plant with delay.

Piotr M. Marusak

Online Adaptation of Path Formation in UAV Search-and-Identify Missions

In this paper, we propose a technique for optimisation and online adaptation of search paths of unmanned aerial vehicles (UAVs) in search-and-identify missions. In these missions, a UAV has the objective to search for targets and to identify those. We extend earlier work that was restricted to offline generation of search paths by enabling the UAVs to adapt the search path online (i.e., at runtime). We let the UAV start with a pre-planned search path, generated by a Particle Swarm Optimiser, and adapt it at runtime based on expected value of information that can be acquired in the remainder of the mission. We show experimental results from 3 different types of UAV agents: two benchmark agents (one without any online adaptation that we call ‘naive’ and one with predefined online behaviour that we call ‘exhaustive’) and one with adaptive online behaviour, that we call ‘adaptive’. Our results show that the adaptive UAV agent outperforms both the benchmarks, in terms of jointly optimising the search and identify objectives.

Willem H. van Willigen, Martijn C. Schut, A. E. Eiben, Leon J. H. M. Kester

Reconstruction of Causal Networks by Set Covering

We present a method for the reconstruction of networks, based on the order of nodes visited by a stochastic branching process. Our algorithm reconstructs a network of minimal size that ensures consistency with the data. Crucially, we show that global consistency with the data can be achieved through purely local considerations, inferring the neighbourhood of each node in turn. The optimisation problem solved for each individual node can be reduced to a set covering problem, which is known to be NP-hard but can be approximated well in practice. We then extend our approach to account for noisy data, based on the Minimum Description Length principle. We demonstrate our algorithms on synthetic data, generated by an SIR-like epidemiological model.

Nick Fyson, Tijl De Bie, Nello Cristianini

The Noise Identification Method Based on Divergence Analysis in Ensemble Methods Context

In this paper we propose a divergence based method for noise detection in ensemble method context where the prediction results from different models are treated as a multidimensional variable that contains constructive and destructive latent components. The crucial stage is the proper destructive and constructive components classification. We propose to calculate the noisiness of the particular latent component as the divergence from chosen reference noise. It allows us to identify the wide range of noises besides the typical signals with close analytical form such as Gaussian or uniform. The real data experiment with load energy prediction confirms presented methodology.

Ryszard Szupiluk, Piotr Wojewnik, Tomasz Zabkowski

Efficient Predictive Control and Set–Point Optimization Based on a Single Fuzzy Model

The idea proposed in the paper consists in significant simplification of the control structure with a predictive control algorithm and a steady–state target optimization. It is done by application of only one fuzzy (nonlinear) dynamic control plant model for both: predictive control and set–point calculation. The approach exploits possibilities offered by a fuzzy model used by the predictive control algorithm. The fuzzy model is of Takagi–Sugeno type with step responses used as the local models. Such a model can be obtained relatively easy and well tuned using neural networks. The proposed approach, despite simplification of the control system, offers very good control performance. It is demonstrated using an example of a control system of a nonlinear chemical reactor with inverse response.

Piotr M. Marusak

Wind Turbines States Classification by a Fuzzy-ART Neural Network with a Stereographic Projection as a Signal Normalization

In this paper wind turbines operational states classification is considered. The fuzzy-ART neural network is proposed as a classifying system. Applying of stereographic projection as an input signals normalization procedure is introduced. Both theoretical justification is discussed and results of experiments are presented. It turns out that the introduced normalization procedure improves classification results.

Tomasz Barszcz, Marzena Bielecka, Andrzej Bielecki, Mateusz Wójcik

Binding and Cross-Modal Learning in Markov Logic Networks

Binding — the ability to combine two or more modal representations of the same entity into a single shared representation is vital for every cognitive system operating in a complex environment. In order to successfully adapt to changes in an dynamic environment the binding mechanism has to be supplemented with cross-modal learning. In this paper we define the problems of high-level binding and cross-modal learning. By these definitions we model a binding mechanism and a cross-modal learner in a Markov logic network and test the system on a synthetic object database.

Alen Vrečko, Danijel Skočaj, Aleš Leonardis

Chaotic Exploration Generator for Evolutionary Reinforcement Learning Agents in Nondeterministic Environments

In reinforcement learning exploration phase, it is necessary to introduce a process of trial and error to discover better rewards obtained from environment. To this end, one usually uses the uniform pseudorandom number generator in exploration phase. However, it is known that chaotic source also provides a random-like sequence similar to stochastic source. In this paper we have employed the chaotic generator in the exploration phase of reinforcement learning in a nondeterministic maze problem. We obtained promising results in the so called maze problem.

Akram Beigi, Nasser Mozayani, Hamid Parvin

Parallel Graph Transformations Supported by Replicated Complementary Graphs

Graph transformations are the powerful formalism allowing describing a behavior of systems of various types. Parallel computations paradigm makes computations faster if we are able to reduce additional costs related to a communication overhead and a complexity of design of such systems. Replicated complementary graphs concept allows a parallel execution of graph transformation rules (designed for the centralized graph case) on a distributed environment. The possibility and the cost of data replication will be considered in the paper in the context of double-pushout approach.

Leszek Kotulski, Adam Sędziwy

Diagnosis of Cardiac Arrhythmia Using Fuzzy Immune Approach

Classification is an important data mining task in biomedicine. For easy comprehensibility, rules are preferrable to another functions in the analysis of biomedical data. The aim of this work is to use a new fuzzy immune rule-based classification system for a medical diagnosis of a cardiovascular disease. In this study, fuzzy immune approach (FIA), which can be improved by ours, is a new method and firstly, it is applied to ECG dataset. The performance of the proposed approach, in terms of classification accuracy, ROC curves, and area under the ROC curve (AUC) was compared with traditional classifier schemes: C4.5, Naïve Bayes, KStar, Meta END, and ANN. The classification accuracies and AUC statistics of FIA for the data sets used are the highest among the classifiers reported on the UCI website and other classifiers used for related problems and tested by cross validation.

Olgierd Unold

Systems Theory

Adaptive Finite Automaton: A New Algebraic Approach

The purpose is to present a new and better representation for the adaptive finite automaton and to also show that both formulations – the original and the newly created – have the same computational power. Adaptive finite automaton original formulation was explored and a way to overcome some difficulties found by [7] in its representation and proofs about its computational power were sought. Afterwards both formulations show to be equivalent in representation and in computational power, but the new one has a highly simplified algebraic notation. The use of the new formulation actually allows simpler theorem proofs and generalizations, as can be verified in the last section of the paper.

Reginaldo Inojosa Silva Filho, Ricardo Luis de Azevedo da Rocha

Cryptanalytic Attack on the Self-Shrinking Sequence Generator

In this paper, a cryptanalysis on the Self-Shrinking Generator a well known sequence generator with cryptographic application is presented. An improvement in the Guess-and-Determine cryptanalytic technique has been proposed. Numerical results that improve other cryptanalysis developed on such a generator are given. In particular, complexities in the order of

0.2

) for the amount of intercepted sequence,

(

) for computer memory and

0.5

) for execution time (

being the length of the generator register) are obtained. In addition, a specific hardware for a practical cryptanalysis has been proposed.

Maria Eugenia Pazo-Robles, Amparo Fúster-Sabater

About Nonnegative Matrix Factorization: On the posrank Approximation

This work addresses the concept of nonnegative matrix factorization (NMF). Some relevant issues for its formulation as as a non-linear optimization problem will be discussed. The primary goal of NMF is that of obtaining good quality approximations, namely for video/image visualization. The importance of the rank of the factor matrices and the use of global optimization techniques is investigated. Some computational experience is reported indicating that, in general, the relation between the quality of the obtained local minima and the factor matrices dimensions has a strong impact on the quality of the solutions associated with the decomposition.

Ana de Almeida

Stability of Positive Fractional Continuous-Time Linear Systems with Delays

Necessary and sufficient conditions for the asymptotic stability of positive fractional continuous-time linear systems with delays are established. It is shown that: 1) the asymptotic stability of the positive fractional system is independent of their delays, 2) the checking of the asymptotic stability of the positive fractional systems with delays can be reduced to checking of the asymptotic stability of positive standard linear systems without delays.

Tadeusz Kaczorek

Output-Error Model Training for Gaussian Process Models

The training of a regression model depends on the purpose of the model. When a black-box model of dynamic systems is trained, two purposes are particularly common: prediction and simulation. The purpose of this paper is to highlight the differences between the learning of a dynamic-system model for prediction and for simulation in the presence of noise for Gaussian process models. Gaussian process models are probabilistic, nonparametric models that recently generated interest in the machine-learning community. This method can also be used also for the modelling of dynamic systems, which is the main interest of the engineering community. The paper elaborates the differences between prediction- and simulation-purposed modelling in the presence of noise, which is more difficult in the case when we train the model for simulation. An example is given to illustrate the described differences.

Juš Kocijan, Dejan Petelin

Support Vector Machines

Learning Readers’ News Preferences with Support Vector Machines

We explore the problem of learning and predicting popularity of articles from online news media. The only available information we exploit is the textual content of the articles and the information whether they became popular – by users clicking on them – or not. First we show that this problem cannot be solved satisfactorily in a naive way by modelling it as a binary classification problem. Next, we cast this problem as a ranking task of pairs of popular and non-popular articles and show that this approach can reach accuracy of up to 76%. Finally we show that prediction performance can improve if more content-based features are used. For all experiments, Support Vector Machines approaches are used.

Elena Hensinger, Ilias Flaounas, Nello Cristianini

Incorporating a Priori Knowledge from Detractor Points into Support Vector Classification

In this article, we extend the idea of a priori knowledge in the form of detractor points presented recently for Support Vector Classification. We show that detractor points can belong to the new type of support vectors – training samples which lie outside a margin bounded region. We present the new application for a priori knowledge from detractor points – improving generalization performance of Support Vector Classification while reducing a complexity of a model by removing a bunch of support vectors. The experiments show that indeed the new type of a priori knowledge improves generalization performance of reduced models. The tests were performed on selected classification data sets, and on stock price data from public domain repositories.

Marcin Orchel

A Hybrid AIS-SVM Ensemble Approach for Text Classification

In this paper we propose and analyse methods for expanding state-of-the-art performance on text classification. We put forward an ensemble-based structure that includes Support Vector Machines (SVM) and Artificial Immune Systems (AIS). The underpinning idea is that SVM-like approaches can be enhanced with AIS approaches which can capture dynamics in models. While having radically different genesis, and probably because of that, SVM and AIS can cooperate in a committee setting, using a heterogeneous ensemble to improve overall performance, including a confidence on each system classification as the differentiating factor.

Results on the well-known Reuters-21578 benchmark are presented, showing promising classification performance gains, resulting in a classification that improves upon all baseline contributors of the ensemble committee.

Mário Antunes, Catarina Silva, Bernardete Ribeiro, Manuel Correia

Regression Based on Support Vector Classification

In this article, we propose a novel regression method which is based solely on Support Vector Classification. Experiments show that the new method has comparable or better generalization performance than

-insensitive Support Vector Regression. The tests were performed on synthetic data, on various publicly available regression data sets, and on stock price data. Furthermore, we demonstrate how a priori knowledge which has been already incorporated to Support Vector Classification for predicting indicator functions, could be directly used for a regression problem.

Marcin Orchel

Two One-Pass Algorithms for Data Stream Classification Using Approximate MEBs

It has been recently shown that the quadratic programming formulation underlying a number of kernel methods can be treated as a minimal enclosing ball (MEB) problem in a feature space where data has been previously embedded. Core Vector Machines (CVMs) in particular, make use of this equivalence in order to compute Support Vector Machines (SVMs) from very large datasets in the batch scenario. In this paper we study two algorithms for online classification which extend this family of algorithms to deal with large data streams. Both algorithms use analytical rules to adjust the model extracted from the stream instead of recomputing the entire solution on the augmented dataset. We show that these algorithms are more accurate than the current extension of CVMs to handle data streams using an analytical rule instead of solving large quadratic programs. Experiments also show that the online approaches are considerably more efficient than periodic computation of CVMs even though warm start is being used.

Ricardo Ñanculef, Héctor Allende, Stefano Lodi, Claudio Sartori

Bioinformatics

X-ORCA - A Biologically Inspired Low-Cost Localization System

In nature, localization is a very fundamental task for which natural evolution has come up with many powerful solutions. In technical applications, however, localization is still quite a challenge, since most ready-to-use systems are not satisfactory in terms of costs, resolution, and effective range. This paper proposes a new localization system that is largely inspired by auditory system of the barn owl. A first prototype has been implemented on a low-cost field-programmable gate array and is able to determine the time difference of two 300MHz signals with a resolution of about 0.02ns, even though the device is clocked as slow as 85MHz. X-ORCA is able to achieve this performance by adopting some of the core properties of the biological role model.

Enrico Heinrich, Marian Lüder, Ralf Joost, Ralf Salomon

On the Origin and Features of an Evolved Boolean Model for Subcellular Signal Transduction Systems

In this paper we deal with the evolved Boolean model of the subcellular network for a hypothetical subcellular task that performs some of the basic cellular functions. The Boolean network is trained with a genetic algorithm and the obtained results are analyzed. We show that the size of the evolved Boolean network relates strongly to the task, that the number of output combinations is decreased, which is in concordance with the biological (measured) networks, and that the number of non-canalyzing inputs is increased, which indicates its specialization to the task. We conclude that the structure of the evolved network is biologically relevant, since it incorporates properties of evolved biological systems.

Branko Šter, Monika Avbelj, Roman Jerala, Andrej Dobnikar

Similarity of Transcription Profiles for Genes in Gene Sets

In gene set focused knowledge-based analysis we assume that genes from the same functional gene set have similar transcription profiles. We compared the distributions of similarity scores of gene transcription profiles between genes from the same gene sets and genes chosen at random. In line with previous research, our results show that transcription profiles of genes from the same gene sets are on average indeed more similar than random transcription profiles, although the differences are slight. We performed the experiments on 35 human cancer data sets, with KEGG pathways and BioGRID interactions as gene set sources. Pearson correlation coefficient and interaction gain were used as association measures.

Marko Toplak, Tomaż Curk, Blaż Zupan

Backmatter

Titel: Adaptive and Natural Computing Algorithms
herausgegeben von: Andrej Dobnikar
Uroš Lotrič
Branko Šter
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-20267-4
Print ISBN: 978-3-642-20266-7
DOI: https://doi.org/10.1007/978-3-642-20267-4