Skip to main content

2016 | Buch

Hybrid Artificial Intelligent Systems

11th International Conference, HAIS 2016, Seville, Spain, April 18-20, 2016, Proceedings

herausgegeben von: Francisco Martínez-Álvarez, Alicia Troncoso, Héctor Quintián, Emilio Corchado

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This volume constitutes the refereed proceedings of the 11th International Conference on Hybrid Artificial Intelligent Systems, HAIS 2016, held in Seville, Spain, in April 2016.

The 63 full papers published in this volume were carefully reviewed and selected from 150 submissions. They are organized in topical sections on data mining and knowledge discovery; time series; bio-inspired models and evolutionary computation; learning algorithms; video and image; classification and cluster analysis; applications; bioinformatics; and hybrid intelligent systems for data mining and applications.

Inhaltsverzeichnis

Frontmatter

Data Mining and Knowledge Discovery

Frontmatter
Screening a Case Base for Stroke Disease Detection

Stroke stands for one of the most frequent causes of death, without distinguishing age or genders. Despite representing an expressive mortality figure, the disease also causes long-term disabilities with a huge recovery time, which goes in parallel with costs. However, stroke and health diseases may also be prevented considering illness evidence. Therefore, the present work will start with the development of a decision support system to assess stroke risk, centered on a formal framework based on Logic Programming for knowledge representation and reasoning, complemented with a Case Based Reasoning (CBR) approach to computing. Indeed, and in order to target practically the CBR cycle, a normalization and an optimization phases were introduced, and clustering methods were used, then reducing the search space and enhancing the cases retrieval one. On the other hand, and aiming at an improvement of the CBR theoretical basis, the predicates` attributes were normalized to the interval 0…1, and the extensions of the predicates that match the universe of discourse were rewritten, and set not only in terms of an evaluation of its Quality-of-Information (QoI), but also in terms of an assessment of a Degree-of-Confidence (DoC), a measure of oneʼs confidence that they fit into a given interval, taking into account their domains, i.e., each predicate attribute will be given in terms of a pair (QoI, DoC), a simple and elegant way to represent data or knowledge of the type incomplete, self-contradictory, or even unknown.

José Neves, Nuno Gonçalves, Ruben Oliveira, Sabino Gomes, João Neves, Joaquim Macedo, António Abelha, César Analide, José Machado, Manuel Filipe Santos, Henrique Vicente
SemSynX: Flexible Similarity Analysis of XML Data via Semantic and Syntactic Heterogeneity/Homogeneity Detection

In this paper we introduce and experimentally assess SemSynX, a novel technique for supporting similarity analysis of XML data via semantic and syntactic heterogeneity/homogeneity detection. Given two XML trees, SemSynX retrieves a list of semantic and syntactic heterogeneity/homogeneity matches of objects (i.e., elements, values, tags, attributes) occurring in certain paths of the trees. A local score that takes into account the path and value similarity is given for each heterogeneity/homogeneity found. A global score that summarizes the number of equal matches as well as the local scores globally is also provided. The proposed technique is highly customizable, and it permits the specification of thresholds for the requested degree of similarity for paths and values as well as for the degree of relevance for path and value matching. It thus makes possible to “adjust” the similarity analysis depending on the nature of the input XML trees. SemSynX has been implemented in terms of a XQuery library, as to enhance interoperability with other XML processing tools. To complete our analytical contributions, a comprehensive experimental assessment and evaluation of SemSynX over several classes of XML documents is provided.

Jesús M. Almendros-Jiménez, Alfredo Cuzzocrea
Towards Automatic Composition of Multicomponent Predictive Systems

Automatic composition and parametrisation of multicomponent predictive systems (MCPSs) consisting of chains of data transformation steps is a challenging task. In this paper we propose and describe an extension to the Auto-WEKA software which now allows to compose and optimise such flexible MCPSs by using a sequence of WEKA methods. In the experimental analysis we focus on examining the impact of significantly extending the search space by incorporating additional hyperparameters of the models, on the quality of the found solutions. In a range of extensive experiments three different optimisation strategies are used to automatically compose MCPSs on 21 publicly available datasets. A comparison with previous work indicates that extending the search space improves the classification accuracy in the majority of the cases. The diversity of the found MCPSs are also an indication that fully and automatically exploiting different combinations of data cleaning and preprocessing techniques is possible and highly beneficial for different predictive models. This can have a big impact on high quality predictive models development, maintenance and scalability aspects needed in modern application and deployment scenarios.

Manuel Martin Salvador, Marcin Budka, Bogdan Gabrys
LiCord: Language Independent Content Word Finder

Content Words (CWs) are important segments of the text. In text mining, we utilize them for various purposes such as topic identification, document summarization, question answering etc. Usually, the identification of CWs requires various language dependent tools. However, such tools are not available for many languages and developing of them for all languages is costly. On the other hand, because of recent growth of text contents in various languages, language independent text mining carries great potentiality. To mine text automatically, the language tool independent CWs finding is a requirement. In this research, we devise a framework that identifies text segments into CWs in a language independent way. We identify some structural features that relate text segments into CWs. We devise the features over a large text corpus and apply machine learning-based classification that classifies the segments into CWs. The proposed framework only uses large text corpus and some training examples, apart from these, it does not require any language specific tool. We conduct experiments of our framework for three different languages: English, Vietnamese and Indonesian, and found that it works with more than 83 % accuracy.

Md-Mizanur Rahoman, Tetsuya Nasukawa, Hiroshi Kanayama, Ryutaro Ichise
Mining Correlated High-Utility Itemsets Using the Bond Measure

Mining high-utility itemsets (HUIs) is the task of finding the sets of items that yield a high profit in customer transaction databases. An important limitation of traditional high-utility itemset mining is that only the utility measure is used for assessing the interestingness of patterns. This leads to finding many itemsets that have a high profit but contain items that are weakly correlated. To address this issue, this paper proposes to integrate the concept of correlation in high-utility itemset mining to find profitable itemsets that are highly correlated, using the bond measure. An efficient algorithm named FCHM (Fast Correlated high-utility itemset Miner) is proposed to efficiently discover correlated high-utility itemsets. Experimental results show that FCHM is highly-efficient and can prune a huge amount of weakly correlated HUIs.

Philippe Fournier-Viger, Jerry Chun-Wei Lin, Tai Dinh, Hoai Bac Le
An HMM-Based Multi-view Co-training Framework for Single-View Text Corpora

Multi-view algorithms such as co-training improve the accuracy of text classification because they optimize the functions to exploit different views of the same input data. However, despite being more promising than the single-view approaches, document datasets often have no natural multiple views available.This study proposes an HMM-based algorithm to generate a new view from a standard text dataset, and a co-training framework where this view generation is applied. Given a dataset and a user classifier model as input, the goal of our framework is to improve the classifier performance by increasing the labelled document pool, taking advantage of the multi-view semi-supervised co-training algorithm.The novel architecture was tested using two different standard text corpora: Reuters and 20 Newsgroups and a classical SVM classifier. The results obtained are promising, showing a significant increase in the efficiency of the classifier compared to a single-view approach.

Eva Lorenzo Iglesias, Adrían Seara Vieira, Lourdes Borrajo Diz
Does Sentiment Analysis Help in Bayesian Spam Filtering?

Unsolicited email campaigns remain as one of the biggest threats affecting millions of users per day. During the last years several techniques to detect unsolicited emails have been developed. Among all proposed automatic classification techniques, machine learning algorithms have achieved more success, obtaining detection rates up to a 96 % [1]. This work provides means to validate the assumption that being spam a commercial communication, the semantics of its contents are usually shaped with a positive meaning. We produce the polarity score of each message using sentiment classifiers, and then we compare spam filtering classifiers with and without the polarity score in terms of accuracy. This work shows that the top 10 results of Bayesian filtering classifiers have been improved, reaching to a 99.21 % of accuracy.

Enaitz Ezpeleta, Urko Zurutuza, José María Gómez Hidalgo
A Context-Aware Keyboard Generator for Smartphone Using Random Forest and Rule-Based System

A soft keyboard is popular for inputting texts on the display of smartphone. As it is a keyboard in display, it has an advantage that can be easily changed unlike the hardware keyboard. An adaptive soft keyboard is needed as different types of people use smartphone in various situations. In this paper, we propose a hybrid system that predicts user behavior patterns using smartphone sensor log data based on random forest and generates the appropriate GUI to the predicted behavior patterns by the rules constructed from users’ preference. The random forest for predicting user behavior patterns has a high generalization performance due to the ensemble of various decision trees. The GUI mapping rules are constructed according to the data collected from 210 users of different ages and genders. Experimental results with the real log data confirm that the proposed system effectively recognizes the situations and the user satisfaction is doubled compared to the conventional methods.

Sang-Muk Jo, Sung-Bae Cho
Privacy Preserving Data Mining for Deliberative Consultations

In deliberative consultations, which utilise electronic surveys as a tool to obtain information from residents, preserving privacy plays an important role. In this paper investigation of a possibility of privacy preserving data mining techniques application in deliberative consultations has been conducted. Three main privacy preserving techniques; namely, heuristic-based, reconstruction-based, and cryptography-based have been analysed and a setup for online surveys performed within deliberative consultations has been proposed.This work can be useful for designers and administrators in the assessment of the privacy risks they face with a system for deliberative consultations. It can also be used in the process of privacy preserving incorporation in such a system in order to help minimise privacy risks to users.

Piotr Andruszkiewicz
Feature Selection Using Approximate Multivariate Markov Blankets

In classification tasks, feature selection has become an important research area. In general, the performance of a classifier is intrinsically affected by existence of irrelevant and redundant features. In order to find an optimal subset of features, Markov blanket discovery can be used to identify such subset. The Approximate Markov blanket (AMb) is a standard approach to induce Markov blankets from data. However, this approach considers only pairwise comparisons of features. In this paper, we introduce a multivariate approach to the AMb definition, called Approximate Multivariate Markov blanket (AMMb), which takes into account interactions among different features of a given subset. In order to test the AMMb, we consider a backward strategy similar to the Fast Correlation Based Filter (FCBF), which incorporates our proposal. The resulting algorithm, named as FCBF$$_{ntc}$$ntc, is compared against the FCBF, Best First (BF) and Sequential Forward Selection (SFS) and tested on both synthetic and real-world datasets. Results show that the inclusion of interactions among features in a subset may yield smaller subsets of features without degrading the classification task.

Rafael Arias-Michel, Miguel García-Torres, Christian Schaerer, Federico Divina
Student Performance Prediction Applying Missing Data Imputation in Electrical Engineering Studies Degree

Nowadays the student performance and its evaluation is a challenge in general terms. Frequently, the students’ scores of a specific curriculum have several fails due to different reasons. In this context, the lack of data of any of student scores adversely affects any future analysis to be done for achieving conclusions. When this occurs, a data imputation process must be performed in order to substitute the data that is missing for estimated values. This paper presents a comparison between two data imputation methods developed by the authors in previous researches, the Adaptive Assignation Algorithm (AAA) based on Multivariate Adaptive Regression Splines (MARS) and other technique called Multivariate Imputation by Chained Equations (MICE). The results obtained demonstrate that the proposed methods allow good results, specially the AAA algorithm.

Concepción Crespo-Turrado, José Luis Casteleiro-Roca, Fernando Sánchez-Lasheras, José Antonio López-Vázquez, Francisco Javier de Cos Juez, José Luis Calvo-Rolle, Emilio Corchado
Accuracy Increase on Evolving Product Unit Neural Networks via Feature Subset Selection

A framework that combines feature selection with evolutionary artificial neural networks is presented. This paper copes with neural networks that are applied in classification tasks. In machine learning area, feature selection is one of the most common techniques for pre-processing the data. A set of filters have been taken into consideration to assess the proposal. The experimentation has been conducted on nine data sets from the UCI repository that report test error rates about fifteen percent or above with reference classifiers such as C4.5 or 1-NN. The new proposal significantly improves the baseline framework, both approaches based on evolutionary product unit neural networks. Also several classifiers have been tried in order to illustrate the performance of the different methods considered.

Antonio J. Tallón-Ballesteros, José C. Riquelme, Roberto Ruiz

Time series

Frontmatter
Rainfall Prediction: A Deep Learning Approach

Previous work has shown that the prediction of meteorological conditions through methods based on artificial intelligence can get satisfactory results. Forecasts of meteorological time series can help decision-making processes carried out by organizations responsible of disaster prevention. We introduce an architecture based on Deep Learning for the prediction of the accumulated daily precipitation for the next day. More specifically, it includes an autoencoder for reducing and capturing non-linear relationships between attributes, and a multilayer perceptron for the prediction task. This architecture is compared with other previous proposals and it demonstrates an improvement on the ability to predict the accumulated daily precipitation for the next day.

Emilcy Hernández, Victor Sanchez-Anguix, Vicente Julian, Javier Palanca, Néstor Duque
Time Series Representation by a Novel Hybrid Segmentation Algorithm

Time series representation can be approached by segmentation genetic algorithms (GAs) with the purpose of automatically finding segments approximating the time series with the lowest possible error. Although this is an interesting data mining field, obtaining the optimal segmentation of time series in different scopes is a very challenging task. In this way, very accurate algorithms are needed. On the other hand, it is well-known that GAs are relatively poor when finding the precise optimum solution in the region where they converge. Thus, this paper presents a hybrid GA algorithm including a local search method, aimed to improve the quality of the final solution. The local search algorithm is based on two well-known algorithms: Bottom-Up and Top-Down. A real-world time series in the Spanish Stock Market field (IBEX35) and a synthetic database (Donoho-Johnstone) used in other researches were used to test the proposed methodology.

Antonio Manuel Durán-Rosal, Pedro Antonio Gutiérrez-Peña, Francisco José Martínez-Estudillo, César Hervás-Martínez
A Nearest Neighbours-Based Algorithm for Big Time Series Data Forecasting

A forecasting algorithm for big data time series is presented in this work. A nearest neighbours-based strategy is adopted as the main core of the algorithm. A detailed explanation on how to adapt and implement the algorithm to handle big data is provided. Although some parts remain iterative, and consequently requires an enhanced implementation, execution times are considered as satisfactory. The performance of the proposed approach has been tested on real-world data related to electricity consumption from a public Spanish university, by using a Spark cluster.

Ricardo L. Talavera-Llames, Rubén Pérez-Chacón, María Martínez-Ballesteros, Alicia Troncoso, Francisco Martínez-Álvarez
Active Learning Classifier for Streaming Data

This work reports the research on active learning approach applied to the data stream classification. The chosen characteristics of the proposed frameworks were evaluated on the basis of the wide range of computer experiments carried out on the three benchmark data streams. Obtained results confirmed the usability of proposed method to the data stream classification with the presence of incremental concept drift.

Michał Woźniak, Bogusław Cyganek, Andrzej Kasprzak, Paweł Ksieniewicz, Krzysztof Walkowiak

Bio-inspired Models and Evolutionary Computation

Frontmatter
Application of Genetic Algorithms and Heuristic Techniques for the Identification and Classification of the Information Used by a Recipe Recommender

Most of existing applications for locating and retrieving information are currently oriented towards offering personalized recommendations using well-known recommender techniques as content-based or collaborative filtering. Nevertheless, automatic information retrieval approaches still lack of an efficient analysis, integration and adaptation of the retrieved information. This can be observed mainly when information comes from different sources. In this way, the application of intelligent techniques can offer an interesting approach for solving this kind of complex processes. This paper employs an evolutive approach in order to improve the retrieval process of correct nutritional information of ingredients in an on-line recommender system of cooking recipes. The proposed algorithm has been tested over real data. Moreover, some heuristics have been included in order to improve the obtained results.

Cristian Peñaranda, Soledad Valero, Vicente Julian, Javier Palanca
A New Visualization Tool in Many-Objective Optimization Problems

During the past decade, development in the field of multi-objective optimization (MOO) and multi-criteria decision-making (MCDM) has led to the so-called many-objective optimization problems (many-MOO), which involve from half a dozen to a few dozens of simultaneous objectives. Many algorithms have been proposed in order to approach the scalability issues involved when trying to solve many-MOO problems. One of these issues is related to the visualization of solutions and relations between them in high dimensional objective space. In this paper we introduce a new visualization tool in order to better illustrate the behavior and relations between objectives in order to assist understanding of the problem by the decision-maker. The understanding provided by the proposed tool can be used to redesign the optimization problem and possibly reduce the number of objectives or transform some of them into constraints, leading to an iterative and also interactive design and optimize cycle.

Roozbeh Haghnazar Koochaksaraei, Rasul Enayatifar, Frederico Gadelha Guimarães
A Novel Adaptive Genetic Algorithm for Mobility Management in Cellular Networks

Metaheuristics are promising tools to use when addressing optimisation problems. On the other hand, most of them are hand-tuned through a long and exhaustive process. In fact, this task requires advanced knowledge about the algorithm used and the problem treated. This constraint restricts their use only to pure abstract scientific research and by expert users. In such a context, their further application by non-experts in real-life fields will be impossible. A promising solution to this issue is the inclusion of adaptation within the search process of these algorithms. On the basis of this idea, this paper demonstrates that simple adaptation strategies can lead to more flexible algorithms for real-world fields, also more efficient when compared to the hand-tuned ones and finally more usable by non-expert users. Seven variants of the Genetic Algorithm (GA) based on different adaptation strategies are proposed. As benchmark problem, an NP-complete real-world optimisation problem in advanced cellular networks, the mobility management task. It is used to assess the efficiency of the proposed variants. The latter were compared against the state-of-the-art algorithm: the Differential Evolution algorithm (DE), and showed promising results.

Zakaria Abd El Moiz Dahi, Chaker Mezioud, Enrique Alba
Bio-Inspired Algorithms and Preferences for Multi-objective Problems

Multi-objective optimization evolutionary algorithms have been applied to solve many real-life decision problems. Most of them require the management of trade-offs between multiple objectives. Reference point approaches highlight a preferred set of solutions in relevant areas of Pareto frontier and support the decision makers to take more confidence evaluation. This paper extends some well-known algorithms to work with collective preferences and interactive techniques. In order to analyse the results driven by the online reference points, two new performance indicators are introduced and tested against some synthetic problem.

Daniel Cinalli, Luis Martí, Nayat Sanchez-Pi, Ana Cristina Bicharra Garcia
Assessment of Multi-Objective Optimization Algorithms for Parametric Identification of a Li-Ion Battery Model

The identification of intelligent models of Li-Ion batteries is a major issue in Electrical Vehicular Technology. On the one hand, the fitness of such models depends on the recursive evaluation of a set of nonlinear differential equations over a representative path in the state space, which is a time consuming task. On the other hand, battery models are intrinsically unstable, and small differences in the initial state or the system, as well as imprecisions in the parameter values, may trigger large differences in the output. Hence, learning battery models from data is a complex multi-modal problem and the parameters of these models must be determined with a high accuracy. In addition to this, producing a dynamical model of a battery is a multi-criteria problem, because the predictive capabilities of the model must be estimated in both the voltage and the temperature domains. In this paper, a selection of state-of-the-art Multi-Objective Optimization Algorithms (SPEA2, NSGA-II, OMOPSO, NSGA-III and MOEA/D) are assessed with regard to their suitability for identifying a model of a Li-Ion battery. The dominance relations that occur between the Pareto fronts are discussed in terms of binary additive $$\epsilon $$ϵ-quality indicators. It is concluded that each of the standard implementations of these algorithms has different issues with this particular problem, MOEA/D and NSGA-III being the best overall alternatives.

Yuviny Echevarría, Luciano Sánchez, Cecilio Blanco
Comparing ACO Approaches in Epilepsy Seizures

Epilepsy is a neurological illness causing disturbances in the nervous system. In recent studies, a wearable device has been developed and a Hybrid Artificial Intelligent System has been proposed for enhancing the anamnesis in the case of new patients or patients with severe convulsions. Among the different Artificial Intelligent techniques that have been proposed during the last years for Epilepsy Convulsions Identification (ECI), Ant Colony Optimization (ACO) has been found as one of the most efficient alternatives in order to learn Fuzzy Rule Based Classifiers (FRBC) to tackle with this problem.This study proposes the comparative of two different ACO based learning strategies: the Pittsburg FRBC learning by means of Ant Colony Systems (ACS) and the Michigan FRBC learning using the Ant-Miner+ algorithm. Different alternatives for both strategies are also analyzed.The obtained results show the Pittsburg ACS learning as a very promising solution for mio-clonic ECI. The Ant-Miner+ based Michigan strategy doesn’t perform well for this research, which is mainly due to the reduced number of features considered in the experimentation.

Paula Vergara, José R. Villar, Enrique de la Cal, Manuel Menéndez, Javier Sedano
Estimating the Maximum Power Delivered by Concentrating Photovoltaics Technology Through Atmospheric Conditions Using a Differential Evolution Approach

The Concentrating Photovoltaic technology is focused on the generation of electricity reducing the associated costs. The main characteristics is to concentrate the sunlight in solar cells by means of optical device such as plastic or glass material. This technology could contribute with several benefits to our environmental. This paper presents a new study of the Concentrating Photovoltaic technology with the analysis of the solar spectrum considering the impact of the direct normal irrandiance spectral distribution. In this way, a estimation of regression coefficients for the spectral matching ratio multivariable regression and the average photon energy multivarible regression are obtained through a differential evolution approach. The accurate calculation of the model parameters reveals relations among the atmospheric conditions very useful for the experts.

Cristobal J. Carmona, F. Pulgar, Antonio Jesús Rivera-Rivas, Maria Jose del Jesus, J. Aguilera
A Hybrid Bio-inspired ELECTRE Approach for Decision Making in Purchasing Agricultural Equipment

Agricultural management is an interdisciplinary approach. This paper discusses decision making in purchasing agricultural equipment. Methodological hybrid bio-inspired and ELECTRE I method is proposed here and it is shown how such a model can be used for complete ranking model. The proposed hybrid bio-inspired ELECTRE I method is implemented on real-world data set. The experimental results in our research could be well compared with PROMETHEE II method presented in our previously study.

Dragan Simić, Jovana Gajić, Vladimir Ilin, Vasa Svirčević, Svetlana Simić

Learning Algorithms

Frontmatter
Evaluating the Difficulty of Instances of the Travelling Salesman Problem in the Nearby of the Optimal Solution Based on Random Walk Exploration

Combinatorial optimization is one of the main research areas in Evolutionary Computing and Operational Research, and the Travelling Salesman Problem one of their most popular problems. The never ending quest of researchers for new and more difficult combinatorial problems to stress their evolutionary algorithms leads to investigate how to measure the difficulty of Travelling Salesman Problem instances. By developing methodologies for separating ease from difficult instances, researchers will be confident about the performance of their algorithms. In this proof-of-concept, a methodology for evaluating the difficulty of instances of the Travelling Salesman Problem in the nearby of the optimal solution is proposed. This methodology is based on the use of Random Walk to explore the closeness area of the optimal solution. Instances with a more pronounced gradient towards the optimal solution might be considered easier than instances exhibiting almost a null gradient. The exploration of this gradient is done by starting from the optimal tour and later modifying it with a Random Walk process. The aim is to propose a methodology to evaluate the difficulty of instances of Travelling Salesman Problem, which can be applied to other combinatorial-problems instances. As a consequence of this work, a methodology to evaluate the difficulty of Travelling Salesman Problem instances is proposed and confronted to a wide set of instances, and finally a rank of their difficulty is stated.

Miguel Cárdenas-Montes
A Nearest Hyperrectangle Monotonic Learning Method

We can find real prediction learning problems whose class attribute is represented by ordinal values that should increase with some of the explaining attributes. They are known as classification problems with monotonicity constraints. In this contribution, our goal is to formalize the nearest hyperrectangle learning approach to manage monotonicity constraints. The idea behind it is to retain objects in $$\mathbb {R}^n$$Rn, which can be either single points or hyperrectangles or rules into a combined model. The approach is checked with experimental analysis involving wide range of monotonic data sets. The results reported, verified by nonparametric statistical tests, show that our approach is very competitive with well-known techniques for monotonic classification.

Javier García, José-Ramón Cano, Salvador García
Knowledge Modeling by ELM in RL for SRHT Problem

Single Robot Hose Transport (SRHT) is a limit case of Linked Multicomponent Robotic Systems (L-MCRS), when one robot moves the tip of a hose to a desired position, while the other hose extreme is attached to a source position. Reinforcement Learning (RL) algorithms have been applied to learn autonomously the robot control with success. However, RL algorithms produce large and intractable data structures. This paper addresses the problem by learning an Extreme Learning Machine (ELM) from the state-action value Q-table, obtaining very relevant data reduction. In this paper we evaluate empirically a classification strategy to formulate ELM learning to provide approximations to the Q-table, obtaining very promising results.

Jose Manuel Lopez-Guede, Asier Garmendia, Manuel Graña
Can Metalearning Be Applied to Transfer on Heterogeneous Datasets?

Machine learning processes consist in collecting data, obtaining a model and applying it to a given task. Given a new task, the standard approach is to restart the learning process and obtain a new model. However, previous learning experience can be exploited to assist the new learning process. The two most studied approaches for this are metalearning and transfer learning. Metalearning can be used for selecting the predictive model to use on a new dataset. Transfer learning allows the reuse of knowledge from previous tasks. However, when multiple heterogeneous tasks are available as potential sources for transfer, the question is which one to use. One approach to address this problem is metalearning. In this paper we investigate the feasibility of this approach. We propose a method to transfer weights from a source trained neural network to initialize a network that models a potentially very different target dataset. Our experiments with 14 datasets indicate that this method enables faster convergence without significant difference in accuracy provided that the source task is adequately chosen. This means that there is potential for applying metalearning to support transfer between heterogeneous datasets.

Catarina Félix, Carlos Soares, Alípio Jorge
Smart Sketchpad: Using Machine Learning to Provide Contextually Relevant Examples to Artists

Sketching is a way for artists to generate ideas quickly, explore alternatives with less risk, and encourage discussions. How might computational tools amplify the abilities of artists? This paper introduces Smart Sketchpad, a digital sketchpad that uses machine learning to identify what is being sketched. The sketchpad then shows example images, color pallets, and subject information. The goal of Smart Sketchpad is to increase an artist’s ability to get ideas down with higher fidelity by making it easier to reference and include existing example works. Our study compares traditional sketching on a phone to Smart Sketchpad. We found that introducing examples during the sketching process leads to higher satisfaction of the sketch by the artist and an external expert.

Michael Fischer, Monica Lam
An Analysis of the Hardness of Novel TSP Iberian Instances

The scope of this paper is to introduce two novel TSP instances based on the freely available geographic coordinates of the main cities from Spain and Portugal. We analyze in the case of the described instances the hardness, the quality of the provided solutions and the corresponding running times, using the Lin-Kernighan heuristic algorithm with different starting solutions and Applegate et al’s branch and cut algorithm.

Gloria Cerasela Crişan, Camelia-M. Pintea, Petrică Pop, Oliviu Matei
A Data Structure to Speed-Up Machine Learning Algorithms on Massive Datasets

Data processing in a fast and efficient way is an important functionality in machine learning, especially with the growing interest in data storage. This exponential increment in data size has hampered traditional techniques for data analysis and data processing, giving rise to a new set of methodologies under the term Big Data. Many efficient algorithms for machine learning have been proposed, facing up time and main memory requirements. Nevertheless, this process could still become hard when the number of features or records is extremely high. In this paper, the goal is not to propose new efficient algorithms but a new data structure that could be used by a variety of existing algorithms without modifying their original schemata. Moreover, the proposed data structure enables sparse datasets to be massively reduced, efficiently processing the data input into a new data structure output. The results demonstrate that the proposed data structure is highly promising, reducing the amount of storage and improving query performance.

Francisco Padillo, J. M. Luna, Alberto Cano, Sebastián Ventura
A Sensory Control System for Adjusting Group Emotion Using Bayesian Networks and Reinforcement Learning

The relationship between sensory stimuli and emotion has been actively investigated, but it is relatively undisclosed to determine appropriate stimuli for inducing the target emotion for a group of people in the same space like school, hospital, store, etc. In this paper, we propose a stimuli control system to adjust group emotion in a closed space, especially kindergarten. The proposed system predicts the next emotion of a group of people using modular tree-structured Bayesian networks, and controls the stimuli appropriate to the target emotion using utility table initialized by domain knowledge and adapted by reinforcement learning as the cases of stimuli and emotions are cumulated. To evaluate the proposed system, the real data were collected for five days from a kindergarten where the sensor and stimulus devices were installed. We obtained 84 % of prediction accuracy, and 56.2 % of stimuli control accuracy. Moreover, in the scenario tests on math and music classes, we could control the stimuli to fit the target emotion with 63.2 % and 76.3 % accuracies, respectively.

Jun-Ho Kim, Ki-Hoon Kim, Sung-Bae Cho

Video and Image

Frontmatter
Identification of Plant Textures in Agricultural Images by Principal Component Analysis

In precision agriculture the extraction of green parts is a very important task. One of the biggest issues, when it comes to computer vision, is image segmentation, which has motivated the research conducted in this work. Our goal is the segmentation of vegetative and soil parts in the images. For this proposal a novel method of segmentation is defined in which different vegetation indices are calculated and through the reduction of components by principal component analysis (PCA) we obtain an enhanced greyscale image. Finally, by Otsu thresholding, we binarize the grayscale image isolating the green parts from the other elements in the image.

Martín Montalvo, María Guijarro, José Miguel Guerrero, Ángela Ribeiro
Automatic Image-Based Method for Quantitative Analysis of Photosynthetic Cell Cultures

This work deals with an automatic quantitative analysis of photosynthetic cell cultures. It uses images captured by a confocal fluorescent microscope for automatic determination the number of cells in sample containing complex 3D structure of cell clusters. Experiments were performed on the confocal microscope Leica TCS SP8 X. The cell nuclei were stained by SYBR® Green fluorescent DNA binding marker. In the first step we used combination of adaptive thresholding to found out areas where nuclei were located. Proposed segmentation steps allowed reduction of noise and artefacts. Z-axis position was obtained as a location of peak from intensity profile. Finally model of scene can be created by emplacement of spheres with adequate diameter to found 3D coordinates. Number of cells per volumetric unit were determined in structurally different culture samples of cell cultures Chenopodium rubrum (Cr) and Solanum lycopersicum (To). The results were verified by manual counting.

Alzbeta Vlachynska, Jan Cerveny, Vratislav Cmiel, Tomas Turecek
Fall Detection Using Body-Worn Accelerometer and Depth Maps Acquired by Active Camera

In the presented system to person fall detection a body-worn accelerometer is used to indicate a potential fall and a ceiling-mounted depth sensor is utilized to authenticate fall alert. In order to expand the observation area the depth sensor has been mounted on a pan-tilt motorized head. If the person acceleration is above a preset threshold the system uses a lying pose detector as well as examines a dynamic feature to authenticate the fall. Thus, more costly fall authentication is not executed frame-by-frame, but instead we fetch from a circular buffer a sequence of depth maps acquired prior to the fall and then process them to confirm fall alert. We show that promising results in terms of sensitivity and specificity can be obtained on publicly available UR Fall Detection dataset.

Michal Kepski, Bogdan Kwolek
Classification of Melanoma Presence and Thickness Based on Computational Image Analysis

Melanoma is a type of cancer that occurs on the skin. Only in the US, 50,000–100,000 patients are yearly diagnosed with melanoma. Five year survival rate highly depends on early detection, varying between 99 % and 15 % depending on the melanoma stage. Melanoma is typically identified with a visual inspection and lately confirmed and classified by a biopsy. In this work, we propose a hybrid system combining features which describe melanoma images together with machine learning models that learn to distinguish melanoma lesions. Although previous works distinguish melanoma and non-melanoma images, those works focus only in the binary case. Opposed to this, we propose to consider finer classification levels within a five class learning problem. We evaluate the performance of several nominal and ordinal classifiers using four performance metrics to provide highlights of several aspects of classification performance, achieving promising results.

Javier Sánchez-Monedero, Aurora Sáez, María Pérez-Ortiz, Pedro Antonio Gutiérrez, Cesar Hervás-Martínez

Classification and Cluster Analysis

Frontmatter
Solution to Data Imbalance Problem in Application Layer Anomaly Detection Systems

Currently, we can observe the increasing number of successful cyber attacks which use vulnerable web pages which allow the hacker (or cracker) to breach the network security (e.g. to deliver a malicious content). This trend is caused by the web applications complexity and diversity, which make it difficult to provide the effective and efficient cyber security countermeasures. Moreover, there are lots of different obfuscation techniques that allow the attacker to overcome signature-based attacks detections mechanisms. Therefore, in this paper we propose a machine-learning web-layer anomaly detection system that adapts our algorithm for packet segmentation and an ensemble of REPTree classifiers. In our experiments we prove that this approach can substantially increase the effectiveness of cyber attacks detection. Moreover, we present the solution to counter the data imbalance problem in cyber security.

Rafał Kozik, Michał Choraś
Ordinal Evolutionary Artificial Neural Networks for Solving an Imbalanced Liver Transplantation Problem

Ordinal regression considers classification problems where there exists a natural ordering among the categories. In this learning setting, thresholds models are one of the most used and successful techniques. On the other hand, liver transplantation is a widely-used treatment for patients with a terminal liver disease. This paper considers the survival time of the recipient to perform an appropriate donor-recipient matching, which is a highly imbalanced classification problem. An artificial neural network model applied to ordinal classification is used, combining evolutionary and gradient-descent algorithms to optimize its parameters, together with an ordinal over-sampling technique. The evolutionary algorithm applies a modified fitness function able to deal with the ordinal imbalanced nature of the dataset. The results show that the proposed model leads to competitive performance for this problem.

Manuel Dorado-Moreno, María Pérez-Ortiz, María Dolores Ayllón-Terán, Pedro Antonio Gutiérrez, Cesar Hervás-Martínez
A Fuzzy-Based Approach for the Multilevel Component Selection Problem

Component-based Software Engineering uses components to construct systems, being a means to increase productivity by promoting software reuse. This work deals with the Component Selection Problem in a multilevel system structure. A fuzzy-based approach is used to construct the required system starting from the set of requirements, using both functional and non-functional requirements. For each selection step, the fuzzy clustering approach groups similar components in order to select the best candidate component that provide the needed required interfaces. To evaluate our approach, we discuss a case study for building a Reservation System. We compare the fuzzy-based approach with an evolutionary-based approach using a metric that assess the overall architecture of the obtained systems, from the coupling and cohesion perspective.

Andreea Vescan, Camelia Şerban
A Clustering-Based Method for Team Formation in Learning Environments

Teamwork is acquiring a growing relevance in learning environments. In many cases, it is a useful and meaningful way to organize the learning activities and improve their outcomes. Moreover, related skills are needed for the professional life of students. This situation makes necessary the availability of support techniques to manage the different aspects of teamwork in educational settings. A key aspect is the organization of teams. Literature offers alternatives to assess students and make up teams, but they are focused on particular and isolated aspects. This work proposes a novel methodology to develop tailored student assessments from the integration of multiple pre-existent evaluation techniques. These new techniques evaluate features belonging to student’s profiles, and use the results to create groups that improve their learning experience. The process is based on clustering techniques and has three-stage. In the first one, lecturers identify features they consider relevant in their context (e.g. leadership, ability to communicate, or spatial skills) and test to asses them. Then, the training stage identifies the combinations of feature values from those tests that characterize high-performance teams, i.e. teams where group learning results are over the average in its context. Finally, the classification stage uses those values to determine which students should belong to which teams, trying to replicate the distribution of student’s profiles in the best teams, and thus their results. The paper reports the experiments performed so far to evaluate the method in a computer engineering school.

Marta Guijarro-Mata-García, Maria Guijarro, Rubén Fuentes-Fernández
R Ultimate Multilabel Dataset Repository

Multilabeled data is everywhere on the Internet. From news on digital media and entries published in blogs, to videos hosted in Youtube, every object is usually tagged with a set of labels. This way they can be categorized into several non-exclusive groups. However, publicly available multilabel datasets (MLDs) are not so common. There is a handful of websites providing a few of them, using disparate file formats. Finding proper MLDs, converting them into the correct format and locating the appropriate bibliographic data to cite them are some of the difficulties usually confronted by researchers and practitioners.In this paper RUMDR (R Ultimate Multilabel Dataset Repository), a new multilabel dataset repository aimed to fuse all public MLDs, is introduced, along with mldr.datasets, an R package which eases the process of retrieving MLDs and their bibliographic information, exporting them to the desired file formats and partitioning them.

Francisco Charte, David Charte, Antonio Rivera, María José del Jesus, Francisco Herrera
On the Impact of Dataset Complexity and Sampling Strategy in Multilabel Classifiers Performance

Multilabel classification (MLC) is an increasingly widespread data mining technique. Its goal is to categorize patterns in several non-exclusive groups, and it is applied in fields such as news categorization, image labeling and music classification. Comparatively speaking, MLC is a more complex task than multiclass and binary classification, since the classifier must learn the presence of various outputs at once from the same set of predictive variables. The own nature of the data the classifier has to deal with implies a certain complexity degree. How to measure this complexness level strictly from the data characteristics would be an interesting objective. At the same time, the strategy used to partition the data also influences the sample patterns the algorithm has at its disposal to train the classifier. In MLC random sampling is commonly used to accomplish this task.This paper introduces TCS (Theoretical Complexity Score), a new characterization metric aimed to assess the intrinsic complexity of a multilabel dataset, as well as a novel stratified sampling method specifically designed to fit the traits of multilabeled data. A detailed description of both proposals is provided, along with empirical results of their suitability for their respective duties.

Francisco Charte, Antonio Rivera, María José del Jesus, Francisco Herrera
Managing Monotonicity in Classification by a Pruned AdaBoost

In classification problems with ordinal monotonic constraints, the class variable should raise in accordance with a subset of explanatory variables. Models generated by standard classifiers do not guarantee to fulfill these monotonicity constraints. Therefore, some algorithms have been designed to deal with these problems. In the particular case of the decision trees, the growing and pruning mechanisms have been modified in order to produce monotonic trees. Recently, also ensembles have been adapted toward this problem, providing a good trade-off between accuracy and monotonicity degree. In this paper we study the behaviour of these decision tree mechanisms built on an AdaBoost scheme. We combine these techniques with a simple ensemble pruning method based on the degree of monotonicity. After an exhaustive experimental analysis, we deduce that the AdaBoost achieves a better predictive performance than standard algorithms, while holding also the monotonicity restriction.

Sergio González, Francisco Herrera, Salvador García
Model Selection for Financial Distress Prediction by Aggregating TOPSIS and PROMETHEE Rankings

Many models have been explored for financial distress prediction, but no consistent conclusions have been drawn on which method shows the best behavior when different performance evaluation measures are employed. Accordingly, this paper proposes the integration of the ranking scores given by two popular multiple-criteria decision-making tools as an important step to help decision makers in selecting the model(s) properly. Selection of the most appropriate prediction method is here shaped as a multiple-criteria decision-making problem that involves a number of performance measures (criteria) and a set of techniques (alternatives). An empirical study is carried out to assess the performance of ten algorithms over six real-life bankruptcy and credit risk databases. The results reveal that the use of a unique performance measure often leads to contradictory conclusions, while the multiple-criteria decision-making techniques may yield a more reliable analysis. Besides, these allow the decision makers to weight the relevance of the individual performance metrics as a function of each particular problem.

Vicente García, Ana I. Marqués, L. Cleofas-Sánchez, José Salvador Sánchez
Combining k-Nearest Neighbor and Centroid Neighbor Classifier for Fast and Robust Classification

The k-NN classifier is one of the most known and widely used nonparametric classifiers. The k-NN rule is optimal in the asymptotic case which means that its classification error aims for Bayes error if the number of the training samples approaches infinity. A lot of alternative extensions of the traditional k-NN have been developed to improve the classification accuracy. However, it is also well-known fact that when the number of the samples grows it can become very inefficient because we have to compute all the distances from the testing sample to every sample from the training data set. In this paper, a simple method which addresses this issue is proposed. Combining k-NN classifier with the centroid neighbor classifier improves the speed of the algorithm without changing the results of the original k-NN. In fact usage confusion matrices and excluding outliers makes the resulting algorithm much faster and robust.

Wiesław Chmielnicki
A First Study on the Use of Boosting for Class Noise Reparation

Class noise refers to the incorrect labeling of examples in classification, and is known to negatively affect the performance of classifiers. In this contribution, we propose a boosting-based hybrid algorithm that combines data removal and data reparation to deal with noisy instances. A experimental procedure to compare its performance against no-preprocessing is developed and analyzed, laying the foundations for future works.

Pablo Morales Álvarez, Julián Luengo, Francisco Herrera
Ensemble of HOSVD Generated Tensor Subspace Classifiers with Optimal Tensor Flattening Directions

The paper presents a modified method of building ensembles of tensor classifiers for direct multidimensional pattern recognition in tensor subspaces. The novelty of the proposed solution is a method of lowering tensor subspace dimensions by rotation of the training pattern to their optimal directions. These are obtained computing and analyzing phase histograms of the structural tensor computed from the training images. The proposed improvement allows for a significant increase of the classification accuracy which favorably compares to the best methods cited in literature.

Bogusław Cyganek, Michał Woźniak, Dariusz Jankowski

Applications

Frontmatter
Evaluation of Decision Trees Algorithms for Position Reconstruction in Argon Dark Matter Experiment

Nowadays, Dark Matter search constitutes one of the most challenging scientific activity. During the last decades several detectors have been developed to evidence the signal of interactions between Dark Matter and ordinary matter. The Argon Dark Matter detector, placed in the Canfranc Underground Laboratory in Spain is the first ton-scale liquid-Ar experiment in operation for Dark Matter direct detection. In parallel to the development of other engineering issues, computational methods are being applied to maximize the exploitation of generated data. In this work, two algorithms based on decision trees —Generalized Boosted Regression Models and Random Forests— are employed to reconstruct the position of the interaction in Argon Dark Matter detector. These two algorithms are confronted to a Montecarlo data set reproducing the physical behaviour of Argon Dark Matter detector. In this work, an in-depth study of the position reconstruction of the interaction is performed for both algorithms, including a study of the distribution of errors.

Miguel Cárdenas-Montes, Bárbara Montes, Roberto Santorelli, Luciano Romero, on behalf of Argon Dark Matter Collaboration
A Preliminary Study of the Suitability of Deep Learning to Improve LiDAR-Derived Biomass Estimation

Light Detection and Ranging (LiDAR) is a remote sensor able to extract three-dimensional information about forest structure. Biophysical models have taken advantage of the use of LiDAR-derived information to improve their accuracy. Multiple Linear Regression (MLR) is the most common method in the literature regarding biomass estimation to define the relation between the set of field measurements and the statistics extracted from a LiDAR flight. Unfortunately, there exist open issues regarding the generalization of models from one area to another due to the lack of knowledge about noise distribution, relationship between statistical features and risk of overfitting. Autoencoders (a type of deep neural network) has been applied to improve the results of machine learning techniques in recent times by undoing possible data corruption process and improving feature selection. This paper presents a preliminary comparison between the use of MLR with and without preprocessing by autoencoders on real LiDAR data from two areas in the province of Lugo (Galizia, Spain). The results show that autoencoders statistically increased the quality of MLR estimations by around 15–30%.

Jorge García-Gutiérrez, Eduardo González-Ferreiro, Daniel Mateos-García, José C. Riquelme-Santos
Fisher Score-Based Feature Selection for Ordinal Classification: A Social Survey on Subjective Well-Being

This paper approaches the problem of feature selection in the context of ordinal classification problems. To do so, an ordinal version of the Fisher score is proposed. We test this new strategy considering data from an European social survey concerning subjective well-being, in order to understand and identify the most important variables for a person’s happiness, which is represented using ordered categories. The input variables have been chosen according to previous research, and these have been categorised in the following groups: demographics, daily activities, social well-being, health and habits, community well-being and personality/opinion. The proposed strategy shows promising results and performs significantly better than its nominal counterpart, therefore validating the need of developing specific ordinal feature selection methods. Furthermore, the results of this paper can shed some light on the human psyche by analysing the most and less frequently selected variables.

María Pérez-Ortiz, Mercedes Torres-Jiménez, Pedro Antonio Gutiérrez, Javier Sánchez-Monedero, César Hervás-Martínez
A Soft Computing Approach to Optimize the Clarification Process in Wastewater Treatment

The coagulation process allows for the removal of colloidal particles suspended in wastewater. Estimating the amount of coagulant required to effectively remove these colloidal particles is usually determined experimentally by the jar test. The configuration of this test is often performed in an iterative manner which has the disadvantage of requiring a significant period of experimentation and an excessive amount of coagulant consumption. This study proposes a methodology to determine the optimum natural coagulant dose while at the same time eliminating the maximum amount of colloidal particles suspended in the wastewater. An estimation of the amount of colloidal particles removed from the wastewater is determined by the turbidity in a standardized jar test, which is applied to the wastewater at the wastewater treatment plant in Logroño (Spain). The methodology proposed is based on the combined use of soft computing techniques and evolutionary techniques based on Genetic Algorithms (GA). Firstly, a group of regression models based on neural networks techniques was performed to predict the final turbidity of a wastewater sample taking into consideration a configuration of jar test inputs. The jar test inputs are: initial turbidity, natural coagulant dosage, temperature, mix speed and mix time. Finally, the best combination of jar test inputs to obtain the optimum natural coagulant dose, while also eliminating the maximum amount of colloidal particles, was achieved by applying evolutionary optimization techniques to the most accurate regression models obtained beforehand.

Marina Corral Bobadilla, Roberto Fernandez Martinez, Ruben Lostado Lorza, Fatima Somovilla Gomez, Eliseo P. Vergara Gonzalez
A Proposed Methodology for Setting the Finite Element Models Based on Healthy Human Intervertebral Lumbar Discs

The human intervertebral lumbar disc is a fibrocartilage structure that is located between the vertebrae of the spine. This structure consists of a nucleus pulposus, the annulus fibrosus and the cartilage endplate. The disc may be subjected to a complex combination of loads. The study of its mechanical properties and movement are used to evaluate the medical devices and implants. Some researchers have used the Finite Element Method (FEM) to model the disc and to study its biomechanics. Estimating the parameters to correctly define these models has the drawback that any small differences between the actual material and the simulation model based on FEM can be amplified enormously in the presence of nonlinearities. This paper sets out a fully automated method to determine the most appropriate material parameters to define the behavior of the human intervertebral lumbar disc models based on FEM. The methodology that is proposed is based on experimental data and the combined use of data mining techniques, Genetic Algorithms (GA) and the FEM. Firstly, based on standard tests (compression, axial rotation, shear, flexion, extension and lateral bending), three-dimensional parameterized Finite Element (FE) models were generated. Then, considering the parameters that define the proposed parameterized FE models, a Design of Experiment (DoE) was completed. For each of the standard tests, a regression technique based on Support Vector Machines (SVM) with different kernels was applied to model the stiffness and bulges of the intervertebral lumbar disc when the parameters of the FE models are changed. Finally, the best combination of parameters was achieved by applying evolutionary optimization techniques that are based on GA to the best, previously obtained regression models.

Fatima Somovilla Gomez, Ruben Lostado Lorza, Roberto Fernandez Martinez, Marina Corral Bobadilla, Ruben Escribano Garcia
Passivity Based Control of Cyber Physical Systems Under Zero-Dynamics Attack

Introduction of computer networks in communication and control of the industrial setups has given rise to the new concept of Cyber Physical System. These control systems are vulnerable to the stealthy cyber-attacks as they are connected to communication networks. Cyber Attacks can cause malfunctioning of plants i.e. abnormal operational behavior, instability, etc. So it is of prime importance to control the unit under cyber-attack in order to protect the whole setup from damage. Passivity is a system property that enables the system designer to devise an energy based control of general dynamic systems. In this paper passive architecture of Cyber Physical Systems is discussed. To prevent the excitation of unstable zeros of system in stealthy attacks like zero-dynamics attack, back-stepping technique is employed to make zero-dynamics of closed loop the system stable. This technique is also used to develop a control law that will prevent the system from reaching unstable region if it is under zero-dynamics attack.

Fawad Hassan, Naeem Iqbal, Francisco Martínez-Álvarez, Khawaja M. Asim
The Multivariate Entropy Triangle and Applications

We extend a framework for the analysis of classifiers to encompass also the analysis of data sets. Specifically, we generalize a balance equation and a visualization device, the Entropy Triangle, for multivariate distributions, not only bivariate ones. With such tools we analyze a handful of UCI machine learning task to start addressing the question of how information gets transformed through machine learning classification tasks.

Francisco José Valverde-Albacete, Carmen Peláez-Moreno
Motivational Engine with Sub-goal Identification in Neuroevolution Based Cognitive Robotics

A first approach towards a new motivational system for an autonomous robot that can learn chains of sub-goals leading to a final reward is proposed in this paper. The motivational system provides the motivation that guides the robot operation according to its knowledge of its sensorial space so that rewards are maximized during its lifetime. In order to do this, a motivational engine progressively and interactively creates an internal model of expected future reward (value function) for areas of the robot’s state space, through a neuroevolutionary process, over samples obtained in the sensorial (state space) traces followed by the robot whenever it obtained a reward. To improve this modelling process, a strategy is proposed to decompose the global value function leading to the reward or goal into several more local ones, thus discovering sub-goals that simplify the whole learning process and that can be reused in the future. The motivational engine is tested in a simulated experiment with very promising results.

Rodrigo Salgado, Abraham Prieto, Pilar Caamaño, Francisco Bellas, Richard J. Duro

Bioinformatics

Frontmatter
TRIQ: A Comprehensive Evaluation Measure for Triclustering Algorithms

Triclustering has shown to be a valuable tool for the analysis of microarray data since its appearance as an improvement of classical clustering and biclustering techniques. Triclustering relaxes the constraints for grouping and allows genes to be evaluated under a subset of experimental conditions and a subset of time points simultaneously. The authors previously presented a genetic algorithm, TriGen, that finds triclusters of gene expression dasta. They also defined three different fitness functions for TriGen: $$MSR_{3D}$$MSR3D, LSL and MSL. In order to asses the results obtained by application of TriGen, a validity measure needs to be defined. Therefore, we present TRIQ, a validity measure which combines information from three different sources: (1) correlation among genes, conditions and times, (2) graphic validation of the patterns extracted and (3) functional annotations for the genes extracted.

David Gutiérrez-Avilés, Cristina Rubio-Escudero
Biclustering of Gene Expression Data Based on SimUI Semantic Similarity Measure

Biclustering is an unsupervised machine learning technique that simultaneously clusters genes and conditions in gene expression data. Gene Ontology (GO) is usually used in this context to validate the biological relevance of the results. However, although the integration of biological information from different sources is one of the research directions in Bioinformatics, GO is not used in biclustering as an input data. A scatter search-based algorithm that integrates GO information during the biclustering search process is presented in this paper. SimUI is a GO semantic similarity measure that defines a distance between two genes. The algorithm optimizes a fitness function that uses SimUI to integrate the biological information stored in GO. Experimental results analyze the effect of integration of the biological information through this measure. A SimUI fitness function configuration is experimentally studied in a scatter search-based biclustering algorithm.

Juan A. Nepomuceno, Alicia Troncoso, Isabel A. Nepomuceno-Chamorro, Jesús S. Aguilar–Ruiz
Discovery of Genes Implied in Cancer by Genetic Algorithms and Association Rules

This work proposes a methodology to identify genes highly related with cancer. In particular, a multi-objective evolutionary algorithm named CANGAR is applied to obtain quantitative association rules. This kind of rules are used to identify dependencies between genes and their expression levels. Hierarchical cluster analysis, fold-change and review of the literature have been considered to validate the relevance of the results obtained. The results show that the reported genes are consistent with prior knowledge and able to characterize cancer colon patients.

Alejandro Sánchez Medina, Alberto Gil Pichardo, Jose Manuel García-Heredia, María Martínez-Ballesteros
Extending Probabilistic Encoding for Discovering Biclusters in Gene Expression Data

In this work, we have extended the experimental analysis about an encoding approach for evolutionary-based algorithms proposed in [1], called probabilistic encoding. The potential of this encoding for complex problems is huge, as candidate solutions represent regions, instead of points, of the search space. We have tested in the context of gene expression biclustering problem, in a selection of a well-known expression matrix datasets. The results obtained for the experimental analysis reveals a satisfactory performance in comparison with other evolutionary-based algorithms, and a high exploration power in very large search spaces.

Francisco Javier Gil-Cumbreras, Raúl Giráldez, Jesús S. Aguilar-Ruiz

Hybrid Intelligent Systems for Data Mining and Applications

Frontmatter
A Hybrid Approach to Closeness in the Framework of Order of Magnitude Qualitative Reasoning

Qualitative reasoning deals with information expressed in terms of qualitative classes and relations among them, such as comparability, negligibility or closeness. In this paper, we focus on the notion of closeness using a hybrid approach which is based on logic, order-of-magnitude reasoning, and on the so-called proximity structures; these structures will be used to decide the elements that are close to each other. Some of the intuitions of this approach are explained on the basis of examples. Moreover, we show some capabilities of the logic with respect to expressivity in order to denote particular positions of the proximity intervals.

Alfredo Burrieza, Emilio Muñoz-Velasco, Manuel Ojeda-Aciego
Hybrid Algorithm for Floor Detection Using GSM Signals in Indoor Localisation Task

One of challenging problems of indoor localisation based on GSM fingerprints is the detection of the current floor. We propose an off–line algorithm that labels fingerprints with the number of current floor. The algorithm uses one pass through the given route to learn the GSM fingerprints. After that the height on the testing passes of the same route can be estimated with high accuracy even for measures registered with various velocities and a month after the learning process. The two phase algorithm detects the points of a potential floor change. Next, the regression function normalises height of the change and calculates its direction. The obtained results are up to 40 % better than the results obtained by the pure regression.

Marcin Luckner, Rafał Górak
Hybrid Optimization Method Applied to Adaptive Splitting and Selection Algorithm

The paper presents an approach to train combined classifiers based on feature space splitting and selection of the best classifier ensemble to each subspace of feature space. The learning method uses a hybrid algorithm that combines a Genetic Algorithm and Cross Entropy Method. The proposed approach was evaluated on the basis of the comprehensive computer experiments run on balanced and imbalanced datasets, and compared with Cluster and Selection algorithm, improving the results obtained by this technique.

Pedro Lopez-Garcia, Michał Woźniak, Enrique Onieva, Asier Perallos
Hybrid Intelligent Model for Fault Detection of a Lithium Iron Phosphate Power Cell Used in Electric Vehicles

Currently, the electrical mobility and the intermittent power generation facilities problem are two of the main purposes of batteries. Batteries, in general terms, have a complex behavior. Due to the usual electrochemical nature of batteries, several tests are made to check their performance, and it is very useful to know a priori how they are working in each case. By checking the battery temperatures for a specific voltage and current value, this work describes a hybrid intelligent model aimed at making fault detection of a LFP (Lithium Iron Phosphate - LiFePO4) power cell type, used in Electric Vehicles. A large set of operating points is obtained from a real system to create the dataset for the operation range of the power cell. Clusters of the different behavior zones have been obtained to accomplish the solution. Some simple regression methods have been applied for each cluster. Polynomial Regression, Artificial Neural Networks and Support Vector Regression were the combined techniques to develop the hybrid intelligent model proposed. The novel hybrid model allows to be achieved good results in all the operating range, detecting all the faults tested.

Héctor Quintián, José-Luis Casteleiro-Roca, Francisco Javier Perez-Castelo, José Luis Calvo-Rolle, Emilio Corchado
Backmatter
Metadaten
Titel
Hybrid Artificial Intelligent Systems
herausgegeben von
Francisco Martínez-Álvarez
Alicia Troncoso
Héctor Quintián
Emilio Corchado
Copyright-Jahr
2016
Electronic ISBN
978-3-319-32034-2
Print ISBN
978-3-319-32033-5
DOI
https://doi.org/10.1007/978-3-319-32034-2