Skip to main content
main-content

Über dieses Buch

This volume constitutes the proceedings of the 10th International Conference on Hybrid Artificial Intelligent Systems, HAIS 2015, held Bilbao, Spain, June 2014. The 60 papers published in this volume were carefully reviewed and selected from 190 submissions. They are organized in topical sections such as data mining and knowledge discovery; video and image analysis; bio-inspired models and evolutionary computation; learning algorithms; hybrid intelligent systems for data mining and applications; classification and cluster analysis, HAIS applications.

Inhaltsverzeichnis

Frontmatter

Data Mining and Knowledge Discovery

Frontmatter

Frequent Sets Discovery in Privacy Preserving Quantitative Association Rules Mining

This paper deals with discovering frequent sets for quantitative association rules mining with preserved privacy. It focuses on privacy preserving on an individual level, when true individual values, e.g., values of attributes describing customers, are not revealed. Only distorted values and parameters of the distortion procedure are public. However, a miner can discover hidden knowledge, e.g., association rules, from the distorted data. In order to find frequent sets for quantitative association rules mining with preserved privacy, not only does a miner need to discretise continuous attributes, transform them into binary attributes, but also, after both discretisation and binarisation, the calculation of the distortion parameters for new attributes is necessary. Then a miner can apply either MASK (Mining Associations with Secrecy Konstraints) or MMASK (Modified MASK) to find candidates for frequent sets and estimate their supports. In this paper the methodology for calculating distortion parameters of newly created attributes after both discretisation and binarisation of attributes for quantitative association rules mining has been proposed. The new application of MMASK for finding frequent sets in discovering quantitative association rules with preserved privacy has been also presented. The application of MMASK scheme for frequent sets mining in quantitative association rules discovery on real data sets has been experimentally verified. The results of the experiments show that both MASK and MMASK can be applied in frequent sets mining for quantitative association rules with preserved privacy, however, MMASK gives better results in this task.

Piotr Andruszkiewicz

An Instance of Social Intelligence in the Internet of Things: Bread Making Recipe Recommendation by ELM Regression

The Social and Smart project proposes a new framework for the interaction between users and their household appliances, where social interaction becomes an intelligent social network of users and appliances which is able to provide intelligent responses to the needs of the users. In this paper we focus on one incrasingly common appliance in the european homes: the bread-maker. There are a number of satisfaction parameters which can be specified by the user: crustiness, fragance, baking finish, and softness. A bread making recipe is composed mainly of the temperatures and times for each of the baking stages: first leavening, second leavening, precooking, cooking and browning. Although a thoroughful real life experimentation and data collection is being carried out by project partners, there are no data available for training/testing yet. Thus, in order to test out ideas we must resort to synthetic data generated using a very abstract model of the satisfaction parameters resulting from a given recipe. The recommendation in this context is carried by a couple of Extreme Learning Machine (ELM) regression models trained to predict the satisfaction parameters from the recipe input, and the other the inverse mapping from the desired satisfaction to the breadmaker appliance recipe. The inverse map allows to provide recommendations to the user given its preferences, while the direct map allows to evaluate a recipe predicting user satisfaction.

Manuel Graña, J. David Nuñez-Gonzalez

Random Forests and Gradient Boosting for Wind Energy Prediction

The ability of ensemble models to retain the bias of their learners while decreasing their individual variance has long made them quite attractive in a number of classification and regression problems. Moreover, when trees are used as learners, the relative simplicity of the resulting models has led to a renewed interest on them on Big Data problems. In this work we will study the application of Random Forest Regression (RFR) and Gradient Boosted Regression (GBR) to global and local wind energy prediction problems working with their high quality implementations in the Scikit–learn Python libraries. Besides a complete exploration of the RFR and GBR application to wind energy prediction, we will show experimentally that both ensemble methods can improve on SVR for individual wind farm energy prediction and that at least GBR is also competitive when the interest lies in predicting wind energy in a much larger geographical scale.

Álvaro Alonso, Alberto Torres, José R. Dorronsoro

Agent-Based Web Resource Acquisition System for Scientific Knowledge Base

The paper presents the summary of design, development, and deployment of the Web Resource Acquisition System as a mean to gather knowledge and scientific resources for common University Knowledge Base. This module was designed and developed under the SYNAT research project. The module uses common logical data interface developed for this purpose and is integrated with the user presentation layer of the Knowledge Base from the Warsaw University of Technology. The work emphasizes on the usage of definition and strategies in the context of Knowledge Delivery problem. Presented solution can be interpreted as an alternative to web crawlers when it comes to general problem of browsing through the Internet data. In particular, the effort was put on in-depth coverage of requested domain of knowledge when specifying query. At the same time, integration with the semi-automatic classification module was performed to support assessment of the retrieved resources with respect of their types. That resulted in development of Multi Agent System for universal resource delivery. Heterogeneous knowledge sources as Bing, Google, CiteSeer, etc. were used to provide wide-ranging input data from the Internet.

Adam Omelczuk, Piotr Andruszkiewicz

An Efficient Nearest Neighbor Method for Protein Contact Prediction

A variety of approaches for protein inter-residue contact prediction have been developed in recent years. However, this problem is far from being solved yet. In this article, we present an efficient nearest neighbor (NN) approach, called PKK-PCP, and an application for the protein inter-residue contact prediction. The great strength of using this approach is its adaptability to that problem. Furthermore, our method improves considerably the efficiency with regard to other NN approaches. Our NN-based method combines parallel execution with k-d tree as search algorithm. The input data used by our algorithm is based on structural features and physico-chemical properties of amino acids besides of evolutionary information. Results obtained show better efficiency rates, in terms of time and memory consumption, than other similar approaches.

Gualberto Asencio-Cortés, Jesús S. Aguilar-Ruiz, Alfonso E. Márquez- Chamorro

Interface for Composing Queries for Complex Databases for Inexperienced Users

In most business activities, decision-making has a very important role, since it may benefit or harm the business. Nowadays decision-making is based on information obtained from databases, which are only accessible directly by computer experts; however, the end-user that requires information from a database is not always a computer expert, so the need arises to allow inexperienced users to obtain information directly from a database. To this end, several tools are commercially available such as visual query building and natural language interfaces to databases (NLIDBs). However, the first kind of tools requires at least a basic level of knowledge of some formal query language, while NLIDBs, despite the fact that users do not require training for using the interface, have not obtained the desired performance due to problems inherent to natural language processing. In this paper an intuitive interface is presented, which allows inexperienced users to easily compose queries in SQL, without the need of training on its operation nor having knowledge of SQL.

Rodolfo A. Pazos R., Alan G. Aguirre L., Marco A. Aguirre L., José A. Martínez F.

A Structural Pattern Mining Approach for Credit Risk Assessment

In recent years graph mining took a valuable step towards harnessing the problem of efficient discovery of substructures in complex input data that do not fit into the usual data mining models. A graph is a general and powerful data representation formalism, which found widespread application in many scientific fields. Finding subgraphs capable of compressing data by abstracting instances of the substructures and identifying interesting patterns is thus crucial. When it comes to financial settings, data is very complex and in particular when risk factors relationships are not taken into account it seriously affects the goodness of predictions. In this paper, we posit that risk analysis can be leveraged if structure can be taken into account by discovering financial motifs in the input graphs. We use gBoost which learns from graph data using a mathematical linear programming procedure combined with a substructure mining algorithm. An algorithm is proposed which has shown to be efficient to extract graph structure from feature vector data. Furthermore, we empirically show that the graph-mining model is competitive with state-of-the-art machine learning approaches in terms of classification accuracy without increase in the computational cost.

Bernardete Ribeiro, Ning Chen, Alexander Kovačec

Video and Image Analysis

Frontmatter

A Novel Technique for Human Face Recognition Using Fractal Code and Bi-dimensional Subspace

Face recognition is considered as one of the best biometric methods used for human identification and verification; this is because of its unique features that differ from one person to another, and its importance in the security field. This paper proposes an algorithm for face recognition and classification using a system based on WPD, fractal codes and two-dimensional subspace for feature extraction, and Combined Learning Vector Quantization and PNN Classifier as Neural Network approach for classification. This paper presents a new approach for extracted features and face recognition. Fractal codes which are determined by a fractal encoding method are used as feature in this system. Fractal image compression is a relatively recent technique based on the representation of an image by a contractive transform for which the fixed point is close to the original image. Each fractal code consists of five parameters such as corresponding domain coordinates for each range block. Brightness offset and an affine transformation. The proposed approach is tested on ORL and FEI face databases. Experimental results on this database demonstrated the effectiveness of the proposed approach for face recognition with high accuracy compared with previous methods.

Benouis Mohamed

A Platform for Matching Context in Real Time

Context-awareness is a key feature of Ambient Intelligence and future intelligent systems. In order to achieve context-aware behavior, applications must be able to detect context information, recognize situations and correctly decide on context-aware action. The representation of context information and the manner in which context is detected are central issues. Based on our previous work in which we used graphs to represent context and graph matching to detect situations, in this paper we present a platform that completely handles context matching, and does so in real time, in the background, by deferring matching to a component that acts incrementally, relying on previous matching results. The platform has been implemented and tested on an AAL-inspired scenario.

Andrei Olaru, Adina Magda Florea

Motion Capture Systems for Jump Analysis

This paper presents several methods used in motion capture to measure jumps. The traditional systems to acquire jump information are force plates, but they are very expensive to most people. Amateur sports enthusiasts that want to improve their performance, do not have enough money to spend in professional systems (

$$\pm 20.000$$

±

20.000

EUR). The price reduction of electronic devices, specifically the inertial measurement units (IMU), are generating new methods of motion capture. In this paper we present the state-of-art motion capture systems for this purpose, from the classical force plates to latest released IMUs. Noise reduction techniques, as an inherent part of motion capture systems, will be reviewed.

Sendoa Rojas-Lertxundi, J. Ramón Fernández-López, Sergio Huerta, Pablo Garía Bringas

Expert System for Handwritten Numeral Recognition Using Dynamic Zoning

This paper introduces an expert system for handwritten digit recognition. The system considers that a numeric handwritten character can be decomposed into vertical and horizontal strokes. Then, the positions where horizontal strokes are connected to the vertical strokes are extracted as features using dynamic zoning. These features are laid into a representative string which is validated by a regular expression following a matching pattern. The knowledge base is constructed from a decision tree structure that stores all well-formatted representative strings with the digits definitions. Finally, the inference engine tries to match unknown digits with the trained knowledge base in order to achieve the recognition. The promising results obtained by testing the system on the well-known MNIST handwritten database are compared with other approaches for corroborating its effectiveness.

David Álvarez, Ramón Fernández, Lidia Sánchez, José Alija

Arabic Handwriting Recognition Based on Synchronous Multi-stream HMM Without Explicit Segmentation

In this study, we propose a synchronous Multi-Stream Hidden Markov Model (MSHMM) for offline Arabic handwriting word recognition. Our proposed model has the advantage of efficiently modelling the temporal interaction between multiple features. These features are composed of a combination of statistical and structural ones, which are extracted over the columns and rows using a sliding window approach. In fact, word models are implemented based on the holistic and analytical approaches without any explicit segmentation. In the first approach, all the words share the same architecture but the parameters are different. Nevertheless, in the second approach, each word has it own model by concatenating its character models. The results carried out on the IFN/ENIT database show that the analytical approach performs better than the holistic one and the MSHMMs in Arabic handwriting recognition is reliable.

Khaoula Jayech, Mohamed Ali Mahjoub, Najoua Essoukri Ben Amara

Image Segmentation Based on Hybrid Adaptive Active Contour

In this paper, we focus on segmentation based active contour model. In fact, we present an hybrid adaptive active contour segmentation algorithm. In this approach, we merge a global and an adaptive local based active contour models in order to segment images. The proposed energy is then minimized based on level set method. Experiments shows the good segmentation results provided by the proposed method.

Amira Soudani, Ezzeddine Zagrouba

Particle Swarm Optimizer with Finite Velocity of Information Transmission

Particle Swarm Algorithm is based on the capacity of the particles which integrate the swarm to share and to communicate relevant information about the best positions visited:

localbest

and

globalbest

. Independently of the position of the particles, all particles know the best position visited by any other particle in the same time-step when it is reached. However, in real world, information transmission has to take some time to travel between two particles positions. In this paper, the effect of a finite velocity for information transmission on the performance of the Particle Swarm Algorithm is analysed. Two scenarios appear in this context; first at all, when the velocity of information transmission is almost equal to the maximum velocity of the particles; and the second one, when it is much larger. This study clarifies the role played by a finite velocity of information transmission in the performance of the algorithm, specially when it is almost equal to the maximum velocity of the particles.

Miguel Cárdenas-Montes, Miguel A. Vega-Rodríguez

Bio-inspired Models and Evolutionary Computation

Frontmatter

Cryptanalysis of Simplified-AES Using Intelligent Agent

Software agent technology is a rapidly developing area of research. In this paper, we introduce a new application of an agent system, called

cryptanalytic-agent

system whose behaviour will be intelligent enough to attack Simplified Advance Encryption Standard (S-AES) block cipher. Our results confirm the versatility of our proposed approach.

Rania Saeed, Ashraf Bhery

A Discrete Bat Algorithm for the Community Detection Problem

Community detection in networks has raised an important research topic in recent years. The problem of detecting communities can be modeled as an optimization problem where a quality objective function that captures the intuition of a community as a set of nodes with better internal connectivity than external connectivity is selected to be optimized. In this work the Bat algorithmwas used as an optimization algorithm to solve the community detection problem. Bat algorithm is a new Nature-inspired metaheuristic algorithm that proved its good performance in a variety of applications. However, the algorithm performance is influenced directly by the quality function used in the optimization process. Experiments on real life networks show the ability of the Bat algorithm to successfully discover an optimized community structure based on the quality function used and also demonstrate the limitations of the BA when applied to the community detection problem.

Eslam A. Hassan, Ahmed Ibrahem Hafez, Aboul Ella Hassanien, Aly A. Fahmy

Emergence of Cooperation Through Simulation of Moral Behavior

Human behavior can be analysed through a moral perspective when considering strategies for cooperation in evolutionary games. Presuming a multiagent task performed by self-centered agents, artificial moral behavior could bring about the emergence of cooperation as a consequence of the computational model itself. Herein we present results from our MultiA computational architecture, derived from a biologically inspired model and projected to simulate moral behavior through an Empathy module. Our testbed is a multiagent game previously defined in the literature such that the lack of cooperation may cause a cascading failure effect (“bankruptcy”) that impacts on the global network topology via local neighborhood interactions. Starting with sensorial information originated from the environment, MultiA transforms it into basic and social artificial emotions and feelings. Then its own emotions are employed to estimate the current state of other agents through an Empathy module. Finally, the artificial feelings of MultiA provide a measure (called well-being) of its performance in response to the environment. Through that measure and reinforcement learning techniques, MultiA learns a mapping from emotions to actions. Results indicate that strategies relied upon simulation of moral behavior may indeed help to decrease the internal reward from selfish selection of actions, thus favoring cooperation as an emergent property of multiagent systems.

Fernanda Monteiro Eliott, Carlos Henrique Costa Ribeiro

MC-PSO/DE Hybrid with Repulsive Strategy – Initial Study

In this initial study it is described the possible hybridization of advanced Particle Swarm Optimization (PSO) modification called MC-PSO and the Differential evolution (DE) algorithm. The advantage of hybridization of various evolutionary techniques is the shared benefit from various advantages of these methods. The motivation came from previous studies of the MC-PSO performance and behavior. The performance of the proposed method is tested on IEEE CEC 2013 benchmark set and compared with both PSO and DE.

Michal Pluhacek, Roman Senkerik, Ivan Zelinka, Donald Davendra

OVRP_ICA: An Imperialist-Based Optimization Algorithm for the Open Vehicle Routing Problem

Open vehicle routing problem (OVRP) is one of the most important problems in vehicle routing, which has attracted great interest in several recent applications in industries. The purpose in solving the OVRP is to decrease the number of vehicles and to reduce travel distance and time of the vehicles. In this article, a new meta-heuristic algorithm called OVRP_ICA is presented for the above-mentioned problem. This is a kind of combinatorial optimization problem that can use a homogeneous fleet of vehicles that do not necessarily return to the initial depot to solve the problem of offering services to a set of customers exploiting the imperialist competitive algorithm. OVRP_ICA is compared with some well-known state-of-the-art algorithms and the results confirmed that it has high efficiency in solving the above-mentioned problem.

Shahab Shamshirband, Mohammad Shojafar, Ali Asghar Rahmani Hosseinabadi, Ajith Abraham

New Adaptive Approach for Multi-chaotic Differential Evolution Concept

This research deals with the hybridization of the two soft computing fields, which are the chaos theory and evolutionary computation. This paper aims on the investigations on the adaptive multi-chaos-driven evolutionary algorithm Differential Evolution (DE) concept. This paper is aimed at the embedding and adaptive alternating of set of two discrete dissipative chaotic systems in the form of chaotic pseudo random number generators for the DE. In this paper the novel adaptive concept of DE/rand/1/bin strategy driven alternately by two chaotic maps (systems) is introduced. From the previous research, it follows that very promising results were obtained through the utilization of different chaotic maps, which have unique properties with connection to DE. The idea is then to connect these two different influences to the performance of DE into the one adaptive multi-chaotic concept with automatic switching without prior knowledge of the optimization problem and without any manual setting of the “switching point”. Repeated simulations were performed on the IEEE CEC 13 benchmark set. Finally, the obtained results are compared with state of the art adaptive representative jDE.

Roman Senkerik, Michal Pluhacek, Donald Davendra, Ivan Zelinka, Jakub Janostik

Automatic Design of Radial Basis Function Networks Through Enhanced Differential Evolution

During the creation of a classification model, it is vital to keep track of numerous parameters and to produce a model based on the limited knowledge inferred often from very confined data. Methods which aid the construction or completely build the classification model automatically, present a fairly common research interest. This paper proposes an approach that employs differential evolution enhanced through the incorporation of additional knowledge concerning the problem in order to design a radial basis neural network. The knowledge is inferred from the unsupervised learning procedure which aims to ensure an initial population of good solutions. Also, the search space is dynamically adjusted i.e. narrowed during runtime in terms of the decision variables count. The results obtained on several datasets suggest that the proposed approach is able to find well performing networks while keeping the structure simple. Furthermore, a comparison with a differential evolution algorithm without the proposed enhancements and a particle swarm optimization algorithm was carried out illustrating the benefits of the proposed approach.

Dražen Bajer, Bruno Zorić, Goran Martinović

Performance Evaluation of Ant Colony Systems for the Single-Depot Multiple Traveling Salesman Problem

Derived from the well-known Traveling Salesman problem (TSP), the multiple-Traveling Salesman problem (multiple-TSP) with single depot is a straightforward generalization: several salesmen located in a given city (the depot) need to visit a set of interconnected cities, such that each city is visited exactly once (by a single salesman) while the total cost of their tours is minimized. Designed for shortest path problems and with proven efficiency for TSP, Ant Colony Systems (ACS) are a natural choice for multiple-TSP as well. Although several variations of ant algorithms for multiple-TSP are reported in the literature, there is no clear evidence on their comparative performance. The contribution of this paper is twofold: it provides a benchmark for single-depot-multiple-TSP with reported optima and performs a thorough experimental evaluation of several variations of the ACS on this problem.

Raluca Necula, Mihaela Breaban, Madalina Raschip

A Metaheuristic Hybridization Within a Holonic Multiagent Model for the Flexible Job Shop Problem

The Flexible Job Shop scheduling Problem (FJSP) is an extension of the classical Job Shop scheduling Problem (JSP) that allows to process operations on one machine out of a set of alternative machines. It is an NP-hard problem consisting of two sub-problems which are the assignment and the scheduling problems. This paper proposes a hybridization of a genetic algorithm with a tabu search within a holonic multiagent model for the FJSP. Firstly, a scheduler agent applies a Neighborhood-based Genetic Algorithm (NGA) for a global exploration of the search space. Secondly, a cluster agents set uses a local search technique to guide the research in promising regions. Numerical tests are made to evaluate our approach, based on two sets of benchmark instances from the literature of the FJSP: Brandimarte and Hurink. The experimental results show the efficiency of our approach in comparison to other approaches.

Houssem Eddine Nouri, Olfa Belkahla Driss, Khaled Ghédira

Quantum Evolutionary Methods for Real Value Problems

We investigate a modified Quantum Evolutionary method for solving real value problems. The Quantum Inspired Evolutionary Algorithms (QIEA) are binary encoded evolutionary techniques used for solving binary encoded problems and their signature feature follows superposition of multiple states on a quantum bit. This is usually implemented by sampling a binary chromosome string, according to probabilities stored in an underlying probability string. In order to apply this paradigm to real value problems, real QIEAs (rQIEA) were developed using real encoding while trying to follow the original quantum computing metaphor. In this paper we report the issues we encounter while implementing some of the published techniques. Firstly, we found that the investigated rQIEAs tend to stray from the original quantum computing interpretation, and secondly, their performance on a number of test problems was not as good as claimed in the original publications. Subsequently, we investigated further and developed binary QIEA for use with real value problems. In general, the investigated and designed quantum method for real-value problems, produced better convergence on most of the examined problems and showed very few inferior results.

Jonathan Wright, Ivan Jordanov

A Modified Wind Driven Optimization Model for Global Continuous Optimization

Metaheuristics have been proposed as an alternative to mathematical optimization methods to address non convex problems involving large search spaces. Within this context a new promising metaheuristic inspired from earth atmosphere phenomena and termed as Wind Driven Optimization (WDO) has been developed by Bayraktar. WDO has been successfully applied to solve continuous optimization problems. However it requires tuning several parameters and it may lead to premature convergence. In this paper the basic WDO is modified in a way to improve the search capabilities of the algorithm and to reduce the number of tunable parameters. In the proposed variant of WDO, the original model equation is modified by introducing a pressure based term to replace the rank based term. Furthermore, the value of the gravitational term is automatically and adaptively set. The performance of the proposed modified WDO has been assessed using several benchmarks in numerical optimization. The obtained results show that the modified WDO outperforms the original WDO in most test problems from both accuracy and robustness.

Abdennour Boulesnane, Souham Meshoul

Learning Algorithms

Frontmatter

Input Filters Implementing Diversity in Ensemble of Neural Networks

This paper discusses possibilities how to use input filters to improve performance in ensemble of neural-networks-based classifiers. The proposed method is based on filtering of input vectors in the used training set, which minimize demands on data preprocessing. Our approach comes out from a technique called boosting, which is based on the principle of combining a large number of so-called weak classifiers into a strong classifier. In the experimental study, we verified that such classifiers are able to sufficiently classify the submitted data into predefined classes without knowledge of details of their significance.

Eva Volna, Martin Kotyrba, Vaclav Kocian

Learning-Based Multi-agent System for Solving Combinatorial Optimization Problems: A New Architecture

Solving combinatorial optimization problems is an important challenge in all engineering applications. Researchers have been extensively solving these problems using evolutionary computations. This paper introduces a novel learning-based multi-agent system (LBMAS) in which all agents cooperate by acting on a common population and a two-stage archive containing promising fitness-based and positional-based solutions found so far. Metaheuristics as agents perform their own method individually and then share their outcomes. This way, even though individual performance may be low, collaboration of metaheuristics leads the system to reach high performance. In this system, solutions are modified by all running metaheuristics and the system learns gradually how promising metaheuristics are, in order to apply them based on their effectiveness. Finally, the performance of LBMAS is experimentally evaluated on Multiprocessor Scheduling Problem (MSP) which is an outstanding combinatorial optimization problem. Obtained results in comparison to well-known competitors show that our multi-agent system achieves better results in reasonable running times.

Nasser Lotfi, Adnan Acan

A Novel Approach to Detect Single and Multiple Faults in Complex Systems Based on Soft Computing Techniques

To ensure complex systems reliability and to extent their life cycle, it is crucial to properly and timely correct eventual faults. In this context, this paper propose an intelligent approach to detect single and multiple faults in complex systems based on soft computing techniques. This approach is based on the combination of fuzzy logic reasoning and Artificial Fish Swarm optimization. The experiments focus on a simulation of the three-tank hydraulic system, a benchmark in the diagnosis domain.

Imtiez Fliss, Moncef Tagina

Using Mouse Dynamics to Assess Stress During Online Exams

Stress is a highly complex, subjective and multidimensional phenomenon. Nonetheless, it is also one of our strongest driving forces, pushing us forward and preparing our body and mind to tackle the daily challenges, independently of their nature. The duality of the effects of stress, that can have positive or negative effects, calls for approaches that can take the best out of this biological mechanism, providing means for people to cope effectively with stress. In this paper we propose an approach, based on mouse dynamics, to assess the level of stress of students during online exams. Results show that mouse dynamics change in a consistent manner as stress settles in, allowing for its estimation from the analysis of the mouse usage. This approach will allow to understand how each individual student is affected by stress, providing additional valuable information for educational institutions to efficiently adapt and improve their teaching processes.

Davide Carneiro, Paulo Novais, José Miguel Pêgo, Nuno Sousa, José Neves

Modeling Users Emotional State for an Enhanced Human-Machine Interaction

Spoken conversational agents have been proposed to enable a more natural and intuitive interaction with the environment and human-computer interfaces. In this paper, we propose a framework to model the user’s emotional state during the dialog and adapt the dialog model dynamically, thus developing more efficient, adapted, and usable conversational agents. We have evaluated our proposal developing a user-adapted agent that facilitates touristic information, and provide a detailed discussion of the positive influence of our proposal in the success of the interaction, the information and services provided, as well as the perceived quality.

David Griol, José Manuel Molina

Hybrid Intelligent Systems for Data Mining and Applications

Frontmatter

Predicting $$\text {PM}_{10}$$ PM 10 Concentrations Using Fuzzy Kriging

The prediction of meteorological phenomena is usually based on the creation of surface from point sources using the certain type of interpolation algorithms. The prediction standardly does not incorporate any kind of uncertainty, either in the calculation itself or its results. The selection of the interpolation method, as well as its parameters depend on the user and his experiences. That does not mean the problem necessarily. However, in the case of the spatial distribution modelling of potentially dangerous air pollutants, the inappropriately selected parameters and model may cause inaccuracies in the results and their evaluation. In this contribution, we propose the prediction using fuzzy kriging that allows incorporating the experts knowledge. We combined previously presented approaches with optimization probabilistic metaheuristic method simulated annealing. The application of this approach in the real situation is presented on the prediction of PM10 particles in the air in the Czech Republic.

Jan Caha, Lukáš Marek, Jiří Dvorský

Neuro-Fuzzy Analysis of Atmospheric Pollution

Present study proposes the application of different soft-computing and statistical techniques to the characterization of atmospheric conditions in Spain. The main goal is to visualize and analyze the air quality in a certain region of Spain (Madrid) to better understand its circumstances and evolution. To do so, real-life data from three data acquisition stations are analysed. The main pollutants acquired by these stations are studied in order to research how the geographical location of these stations and the different seasons of the year are decisive in the behavior of air pollution. Different techniques for dimensionality reduction together with clustering techniques have been applied, in a combination of neural and fuzzy paradigms.

Ángel Arroyo, Verónica Tricio, Emilio Corchado, Álvaro Herrero

Improving Earthquake Prediction with Principal Component Analysis: Application to Chile

Increasing attention has been paid to the prediction of earthquakes with data mining techniques during the last decade. Several works have already proposed the use of certain features serving as inputs for supervised classifiers. However, they have been successfully used without any further transformation so far. In this work, the use of principal component analysis to reduce data dimensionality and generate new datasets is proposed. In particular, this step is inserted in a successfully already used methodology to predict earthquakes. Santiago and Pichilemu, two of the cities mostly threatened by large earthquakes occurrence in Chile, are studied. Several well-known classifiers combined with principal component analysis have been used. Noticeable improvement in the results is reported.

Gualberto Asencio-Cortés, Francisco Martínez-Álvarez, Antonio Morales-Esteban, Jorge Reyes, Alicia Troncoso

Detecting Anomalies in Embedded Computing Systems via a Novel HMM-Based Machine Learning Approach

Computing systems are vulnerable to anomalies that might occur during execution of deployed software: e.g., faults, bugs or deadlocks. When occurring on embedded computing systems, these anomalies may severely hamper the corresponding devices; on the other hand, embedded systems are designed to perform autonomously, i.e., without any human intervention, and thus it is difficult to debug an application to manage the anomaly. Runtime anomaly detection techniques are the primary means of being aware of anomalous conditions. In this paper, we describe a novel approach to detect an anomaly during the execution of one or more applications. Our approach describes the behaviour of the applications using the sequences of memory references generated during runtime. The memory references are seen as signals: they are divided in overlapping frames, then parametrized and finally described with Hidden Markov Models (HMM) for detecting anomalies. The motivations of using such methodology for embedded systems are the following: first, the memory references could be extracted with very low overhead with software or architectural tools. Second, the device HMM analysis framework, while being very powerful in gathering high level information, has low computational complexity and thus is suitable to the rather low memory and computational capabilities of embedded systems. We experimentally evaluated our proposal on a ARM9, Linux based, embedded system using the SPEC 2006 CPU benchmark suite and found that it shows very low error rates for some artificially injected anomalies, namely a malware, an infinite loop and random errors during execution.

Alfredo Cuzzocrea, Eric Medvet, Enzo Mumolo, Riccardo Cecolin

Using Dalvik Opcodes for Malware Detection on Android

Over the last few years, computers and smartphones have become essential tools in our ways of communicating with each-other. Nowadays, the amount of applications in the Google store has grown exponentially, therefore, malware developers have introduced malicious applications in that market. The Android system uses the Dalvik virtual machine. Through reverse engineering, we may be able to get the different opcodes for each application. For this reason, in this paper an approach to detect malware on Android is presented, by using the techniques of reverse engineering and putting an emphasis on operational codes used for these applications. After obtaining these opcodes, machine learning techniques are used to classify apps.

José Gaviria de la Puerta, Borja Sanz, Igor Santos, Pablo García Bringas

A Method to Encrypt $$3$$ 3 D Solid Objects Based on Three-Dimensional Cellular Automata

In this work a novel encryption algorithm to assure the confidentiality of

$$3$$

3

D solid objects is introduced. The encryption method consists of two phases: the confusion phase and the diffusion phase. In the first one a three-dimensional chaotic Cat map is applied

$$N$$

N

times, whereas in the diffusion phase a

$$2$$

2

-th order memory reversible

$$3$$

3

D cellular automata is evolved

$$T$$

T

times during

$$M$$

M

rounds. The encryption method is shown to be secure against the most important cryptanalytic attacks: statistical attacks, differential attack, etc.

A. Martín del Rey

Exemplar Selection Using Collaborative Neighbor Representation

Retrieving the most relevant exemplars in image databases has been a difficult task. Most of exemplar selection methods were proposed and developed to work with a specific classifier. Research in exemplar selection is targeting schemes that can benefit a wide range of classifiers. Recently,

Sparse Modeling Representative Selection

(SMRS) method has been proposed for selecting the most relevant instances. SMRS is based on data self-representation in the sense that it estimates a coding matrix using a codebook set to the data themselves. The matrix coefficients are estimated using block sparsity constraint. In this paper, we propose a coding scheme based on a two stage Collaborative Neighbor Representation in the matrix of coefficients is estimated without any explicit sparse coding. For the second stage, we introduce two schemes for sample pruning in the second stage. Experiments are conducted on summarizing two video movies. We also provide quantitative performance evaluation via classification on the selected prototypes. To this end, one face dataset, one handwritten digits dataset, and one object dataset are used. These experiments showed that the proposed method can outperform state-of-the art methods including the SMRS method.

F. Dornaika, I. Kamal Aldine, B. Cases

On Sentiment Polarity Assignment in the Wordnet Using Loopy Belief Propagation

Sentiment analysis is a very active and nowadays highly addressed research area. One of the problem in sentiment analysis is text classification in terms of its attitude, especially in reviews or comments from social media. In general, this problem can be solved by two different approaches: machine learning methods and based on lexicons. Methods based on lexicons require properly prepared lexicons which usually are obtained manually from experts and it costs a lot in terms of time and resources. This paper aims at automatic lexicon creation for sentiment analysis. There are proposed the methods based on Loopy Belief Propagation that starting from small set of seed words with a priori known sentiment value propagates the sentiment to whole Wordnet.

Marcin Kulisiewicz, Tomasz Kajdanowicz, Przemyslaw Kazienko, Maciej Piasecki

Classification and Cluster Analysis

Frontmatter

Evaluation of Relative Indexes for Multi-objective Clustering

One of the biggest challenges in clustering is finding a robust and versatile criterion to evaluate the quality of clustering results. In this paper, we investigate the extent to which unsupervised criteria can be used to obtain clusters highly correlated to external labels. We show that the usefulness of these criteria is data-dependent and for most data sets multiple criteria are required in order to identify the best performing clustering algorithm. We present a multi-objective evolutionary clustering algorithm capable of finding a set of high-quality solutions. For the real world data sets examined the Pareto front can offer better clusterings than simply optimizing a single unsupervised criterion.

Tomáš Bartoň, Pavel Kordík

A Hybrid Analytic Hierarchy Process for Clustering and Ranking Best Location for Logistics Distribution Center

Facility location decisions play a critical role in the strategic design of supply chain networks. This paper discusses facility location problem with focus on logistics distribution center (LDC) in Balkan Peninsula. Methodological hybrid

Analytical Hierarchy Process

(AHP) and

k

-

means

method is proposed here and it is shown how such a model can be of assistance in analyzing a multi criteria decision-making problem. This research represents a continuation of two existing studies: (1) PROMETHEE II ranking method; and (2) combine Greedy heuristic algorithm and AHP. The experimental results in our research could be well compared with other official results of the feasibility study of the LDC located in Balkan Peninsula.

Dragan Simić, Vladimir Ilin, Ilija Tanackov, Vasa Svirčević, Svetlana Simić

Resampling Multilabel Datasets by Decoupling Highly Imbalanced Labels

Multilabel classification is a task that has been broadly studied in late years. However, how to face learning from imbalanced multilabel datasets (MLDs) has only been addressed latterly. In this regard, a few proposals can be found in the literature, most of them based on resampling techniques adapted from the traditional classification field. The success of these methods varies extraordinarily depending on the traits of the chosen MLDs.

One of the characteristics which significantly influences the behavior of multilabel resampling algorithms is the joint appearance of minority and majority labels in the same instances. It was demonstrated that MLDs with a high level of concurrence among imbalanced labels could hardly benefit from resampling methods. This paper proposes an original resampling algorithm, called REMEDIAL, which is not based on removing majority instances nor creating minority ones, but on a procedure to decouple highly imbalanced labels. As will be experimentally demonstrated, this is an interesting approach for certain MLDs.

Francisco Charte, Antonio Rivera, María José del Jesus, Francisco Herrera

Creating Effective Error Correcting Output Codes for Multiclass Classification

The error correcting output code (ECOC) technique is a genesral framework to solve the multi-class problems using binary classifiers. The key problem in this approach is how to construct the optimal ECOC codewords i.e. the codewords which maximize the recognition ratio of the final classifier. There are several methods described in the literature to solve this problem. All these methods try to maximize the minimal Hamming distance between the generated codewords. In this paper we are showing another approach based both on the average Hamming distance and the estimated misclassification error of the binary classifiers.

Wiesław Chmielnicki

FM3S: Features-Based Measure of Sentences Semantic Similarity

The investigation of measuring Semantic Similarity (SS) between sentences is to find a method that can simulate the thinking process of human. In fact, it has become an important task in several applications including Artificial Intelligence and Natural Language Processing. Though this task depends strongly on word SS, the latter is not the only important feature. The current paper presents a new method for computing sentence semantic similarity by exploiting a set of its characteristics, namely Features-based Measure of Sentences Semantic Similarity (FM3S). The proposed method aggregates in a non-linear function between three components: the noun-based SS including the compound nouns, the verb-based SS using the tense information, and the common word order similarity. It measures the semantic similarity between concepts that play the same syntactic role. Concerning the word-based semantic similarity, an information content-based measure is used to estimate the SS degree between words by exploiting the WordNet “is a” taxonomy. The proposed method yielded into competitive results compared to previously proposed measures with regard to the Li’s benchmark, showing a high correlation with human ratings. Further experiments performed on the Microsoft Paraphrase Corpus showed the best F-measure values compared to other measures for high similarity thresholds. The results displayed by FM3S prove the importance of syntactic information, compound nouns, and verb tense in the process of computing sentence semantic similarity.

Mohamed Ali Hadj Taieb, Mohamed Ben Aouicha, Yosra Bourouis

Improving Enzyme Function Classification Performance Based on Score Fusion Method

Enzymes are important in our life and it plays a vital role in the most biological processes. Computational classification of the enzyme’s function is necessary to save efforts and time. In this paper, an information fusion-based approach is proposed. The unknown sequence is classified through aligning it with all labelled sequences using local pairwise sequence alignment based on different score matrices. The outputs of all pairwise sequence alignment processes are represented by a set of scores. The scores of alignment processes are combined using simple fusion rules. The results of the fusion-based approach achieved results better than all individual sequence alignment processes.

Alaa Tharwat, Mahir M. Sharif, Aboul Ella Hassanien, Hesham A. Hefeny

A Low-Power Context-Aware System for Smartphone Using Hierarchical Modular Bayesian Networks

Various applications using sensors and devices on smartphone are being developed. However, since limited battery capacity does not allow to utilize the phone all the time, studies to increase use-time of phone are very active. In this paper, we propose a hybrid system to increase the longevity of phone. User’s context is recognized through hierarchical modular Bayesian networks, and unnecessary devices are inferred through device management rules. Inferring the user’s context using sensor data, and considering device status, context inferred and user’s tendency, we determine the device which is consuming the battery most. In the experiments with the real log data collected from 28 people for six months, we evaluated the proposed system resulting in the accuracy of 85.68 % and the improvement of battery consumption of about 6 %.

Jae-Min Yu, Sung-Bae Cho

HAIS Applications

Frontmatter

A Parallel Meta-heuristic for Solving a Multiple Asymmetric Traveling Salesman Problem with Simulateneous Pickup and Delivery Modeling Demand Responsive Transport Problems

Transportation is an essential area in the nowadays society. Due to the rapid technological progress, it has gained a great importance, both for business sector and citizenry. Among the different types of transport, one that has gained notoriety recently is the transportation on-demand, because it can affect very positively the people quality of life. There are different kinds of on-demand transportation systems, being the Demand Responsive Transit (DRT) one of the most important one. In this work, a real-life DRT problem is proposed, and modeled as a Rich Traveling Salesman Problem. Specifically, the problem presented is a Multiple Asymmetric Traveling Salesman Problem with Simultaneous Pickup and Delivery. Furthermore, a benchmark for this new problem is also proposed, and its first resolution is offered. For the resolution of this benchmark the recently developed Golden Ball meta-heuristic has been implemented.

E. Osaba, F. Diaz, E. Onieva, Pedro López-García, R. Carballedo, A. Perallos

Self-Organizing Maps Fusion: An Approach to Different Size Maps

A set of neural networks working in an ensemble can lead to better classification results than just one neural network could. In the ensemble, the results of each neural network are fused resulting in a better generalization of the model. Kohonen Self-Organizing Maps is known as a method for dimensionality reduction, data visualization and also for data classification. This work presents a methodology to fuse different size Kohonen Self-Organizing Maps, with the objective of improving classification accuracy. A factorial experiment was conducted in order to test some variables influences. Computational simulations with some datasets from the UCI Machine Learning Repository and from Fundamental Clustering Problems Suite demonstrate an increase in the accuracy classification and the proposed method feasibility was evidenced by the Wilcoxon Signed Rank Test.

Leandro Antonio Pasa, José Alfredo F. Costa, Marcial Guerra de Medeiros

Cloud Robotics in FIWARE: A Proof of Concept

Novel Cloud infrastructures and their extensive set of resources have potential to help robotics to overcome its limitations. Traditionally, those limitations have been related with the number of sensors that are equipped in the robots and their computational power. The drawbacks of these limitations can be reduced by using the benefits of cloud architectures such as cloud computing, Internet of Things (IoT) sensing and cloud storage. FIWARE is an open platform which integrates cloud capabilities and Generics Enablers (GE) to interact with the cloud. This paper proposes the development of a Robotics GE and it presents the integration of the new GE into the FIWARE architecture. Two are the main goals behind this integration, first to bring all the benefits that FIWARE provides to robotics, and second to facilitate the development of robotics applications to non-expert robotics developers. Finally, a real example of the integration is shown by means of a parking meter application that combines context information, robotics, and cloud computing of vision algorithms.

F. Herranz, J. Jaime, I. González, Á. Hernández

Comparing Measurement and State Vector Data Fusion Algorithms for Mobile Phone Tracking Using A-GPS and U-TDOA Measurements

Multi-Sensor Data Fusion (MSDF) becomes one research area in different disciplines including science and engineering. To enhance reliability and accuracy of sensor measurements’ multisensory data fusion techniques are applied. The aim of this paper is to evaluate estimation performance of measurement fusion and state vector fusion algorithms in tracking a moving mobile phone along all journey of a vehicle. These two algorithms based on Kalman Filter are implemented in the tracking system. Performance evaluation is computed using MATLAB and the analysis show position and velocity estimation accuracy of measurement fusion algorithm is better than state vector fusion algorithm.

Ayalew Belay Habtie, Ajith Abraham, Dida Midekso

Hybrid U-TDOA and A-GPS for Vehicle Positioning and Tracking

Due to the current emergent interests in location-based services, 3G and 4G cellular networks provide a key facility to locate the user equipment (UE). Vehicle positioning and tracking using the UE traveling on-board the vehicle is one of the value added location-based services enabled by this feature and has been studied by different researchers. However, there is no single standard on UE positioning technique that can provide better accuracy and coverage. To address this problem, we proposed to use measurement fusion based hybrid UE positioning method-combining measurements collected from A-GPS mobiles and simulated estimates of U-TDOA. Kalman Filter based filtering of positions and velocities as well as accuracy values determined based on measurement errors demonstrate that the proposed hybrid UE positioning method is effective in localizing a moving vehicle with better accuracy.

Ayalew Belay Habtie, Ajith Abraham, Dida Midekso

Parallelizing NSGAII for Accelerating the Registration Areas Optimization in Mobile Communication Networks

In this work, we propose a parallel version of our adaptation of the Non-dominated Sorting Genetic Algorithm II (NSGAII) with the aim of reducing its execution time when solving the Registration Areas Planning Problem (RAPP), a problem that describes one of the most popular strategies to manage the subscribers’ movement in a mobile communication network. In this problem, the use of mobile activity traces is a good choice that allows us to assess the Registration Areas strategy in an accurate way. However and due to the huge number of mobile subscribers, a mobile activity trace of a current network could contain several millions of events, which leads to a large execution time. That is the reason why we propose to parallelize our version of NSGAII in a shared memory system, using for that the OpenMP Application Program Interface. The quality and efficiency of our approach is shown by means of an experimental study.

Víctor Berrocal-Plaza, Miguel A. Vega-Rodríguez, Juan M. Sánchez-Pérez

Improving Hotel Room Demand Forecasting with a Hybrid GA-SVR Methodology Based on Skewed Data Transformation, Feature Selection and Parsimony Tuning

This paper presents a hybrid methodology, in which a KDD-scheme is optimized to build accurate parsimonious models. The methodology tries to find the best model by using genetic algorithms to optimize a KDD scheme formed with the following stages: feature selection, transformation of the skewed input and output data, parameter tuning, and parsimonious model selection. In this work, experiments demonstrated that optimization of these steps significantly improved the model generalization capabilities in some UCI databases. Finally, this methodology was applied to create room demand parsimonious models using booking databases from a hotel located in a region of Northern Spain. Results proved that the proposed method was useful to create models with higher generalization capacity and lower complexity to those obtained with classical KDD processes.

R. Urraca, A. Sanz-Garcia, J. Fernandez-Ceniceros, E. Sodupe-Ortega, F. J. Martinez-de-Pison

A Survey of Hybrid Artificial Intelligence Algorithms for Dynamic Vehicle Routing Problem

In a Dynamic Vehicle Routing Problem (DVRP) new customer orders and changes of existing orders continually arrive and thus disrupt the optimal routing plan. This paper presents a survey of some of the recent hybrid artificial intelligence algorithms suitable for efficient optimization and re-optimization of different DVRPs. An artificial ant colony 2-OPT hybrid algorithm, a hybrid neighborhood search algorithm, and a hybrid heuristic algorithm are explained in detail. Particular interest is focused towards local improvement heuristic algorithms, such as 2-OPT algorithm and OR’s algorithm, which are regularly used in hybrid approaches for intra-route and inter-route improvements.

Vladimir Ilin, Dragan Simić, Jovan Tepić, Gordan Stojić, Nenad Saulić

A Straightforward Implementation of a GPU-accelerated ELM in R with NVIDIA Graphic Cards

General purpose computing on graphics processing units (GPGPU) is a promising technique to cope with nowadays arising computational challenges due to the suitability of GPUs for parallel processing. Several libraries and functions are being released to boost the use of GPUs in real world problems. However, many of these packages require a deep knowledge in GPUs’ architecture and in low-level programming. As a result, end users find trouble in exploiting GPGPU advantages. In this paper, we focus on the GPU-acceleration of a prediction technique specially designed to deal with big datasets: the extreme learning machine (ELM). The intent of this study is to develop a user-friendly library in the open source R language and subsequently release the code in

https://github.com/maaliam/EDMANS-elmNN-GPU.git

. Therefore R users can freely implement it with the only requirement of having a NVIDIA graphic card. The most computationally demanding operations were identified by performing a sensitivity analysis. As a result, only matrix multiplications were executed in the GPU as they take around 99 % of total execution time. A speedup rate up to 15 times was obtained with this GPU-accelerated ELM in the most computationally expensive scenarios. Moreover, the applicability of the GPU-accelerated ELM was also tested with a typical case of model selection, in which genetic algorithms were used to fine-tune an ELM and training thousands of models is required. In this case, still a speedup of 6 times was obtained.

M. Alia-Martinez, J. Antonanzas, F. Antonanzas-Torres, A. Pernía-Espinoza, R. Urraca

Real Implantation of an Expert System for Elderly Home Care

This paper presents an intelligent system for elderly people care at home that has been implemented and tested in real life environments. The expert system is based on the principle of no intrusion. It uses plug-and-play sensors and machine learning algorithms to learn the elderly’s usual activity. If the system detects that something unusual happens (in a wide sense), it sends at real-time alarm to the family, care center or medical agents, without human intervention. The system is actually running in dozens of homes with an accuracy larger that 81 %.

Aitor Moreno-Fernandez-de-Leceta, Unai Arenal Gómez, Jose Manuel Lopez-Guede, Manuel Graña

A Novel Hybrid Algorithm for Solving the Clustered Vehicle Routing Problem

This paper presents a new hybrid optimization approach based on genetic algorithm and simulated annealing for solving the clustered vehicle routing problem (CluVRP). The problem investigated in this paper is a NP-hard combinatorial optimization problem that generalizes the classical vehicle routing problem (VRP) and it is closely related to the generalized vehicle routing problem (GVRP). Preliminary computational results on two sets of benchmark instances are reported and discussed.

Andrei Horvat Marc, Levente Fuksz, Petrică C. Pop, Daniela Dănciulescu

Trading-off Accuracy vs Energy in Multicore Processors via Evolutionary Algorithms Combining Loop Perforation and Static Analysis-Based Scheduling

This work addresses the problem of energy efficient scheduling and allocation of tasks in multicore environments, where the tasks can permit certain loss in accuracy of either final or intermediate results, while still providing proper functionality. Loss in accuracy is usually obtained with techniques that decrease computational load, which can result in significant energy savings. To this end, in this work we use the loop perforation technique that transforms loops to execute a subset of their iterations, and integrate it in our existing optimisation tool for energy efficient scheduling in multicore environments based on evolutionary algorithms and static analysis for estimating energy consumption of different schedules. The approach is designed for multicore XMOS chips, but it can be adapted to any multicore environment with slight changes. The experiments conducted on a case study in different scenarios show that our new scheduler enhanced with loop perforation improves the previous one, achieving significant energy savings (31 % on average) for acceptable levels of accuracy loss.

Zorana Banković, Umer Liqat, Pedro López-García

Distributed Tabu Searches in Multi-agent System for Permutation Flow Shop Scheduling Problem

In this paper, we propose a distributed multi-agent approach to solve the permutation flow shop scheduling problem for the objective of minimizing the makespan. This approach consists of two types of agents that cooperate to find a solution for this problem. A mediator agent who is responsible for generating the initial solution with NEHT heuristic, and scheduler agents, each applying a tabu search to refine a specific sequence of jobs which differs from those of other agents. Computational experiments confirm that our approach provides good results equal to or better than the ones given by other approaches with which we have made comparisons.

Olfa Belkahla Driss, Chaouki Tarchi

Content Based Image Retrieval for Large Medical Image Corpus

In this paper we address the scalability issue when it comes to Content based image retrieval in large image archives in the medical domain. Throughout the text we focus on explaining how small changes in image representation, using existing technologies leads to impressive improvements when it comes to image indexing, search and retrieval duration. We used a combination of OpponentSIFT descriptors, Gaussian Mixture Models, Fisher kernel and Product quantization that is neatly packaged and ready for web integration. The CBIR feature of the system is demonstrated through a Python based web client with features like region of interest selection and local image upload.

Gjorgji Strezoski, Dario Stojanovski, Ivica Dimitrovski, Gjorgji Madjarov

Twitter Sentiment Analysis Using Deep Convolutional Neural Network

In the work presented in this paper, we conduct experiments on sentiment analysis in Twitter messages by using a deep convolutional neural network. The network is trained on top of pre-trained word embeddings obtained by unsupervised learning on large text corpora. We use CNN with multiple filters with varying window sizes on top of which we add 2 fully connected layers with dropout and a softmax layer. Our research shows the effectiveness of using pre-trained word vectors and the advantage of leveraging Twitter corpora for the unsupervised learning phase. The experimental evaluation is made on benchmark datasets provided on the SemEval 2015 competition for the Sentiment analysis in Twitter task. Despite the fact that the presented approach does not depend on hand-crafted features, we achieve comparable performance to state-of-the-art methods on the Twitter2015 set, measuring F1 score of 64.85 %.

Dario Stojanovski, Gjorgji Strezoski, Gjorgji Madjarov, Ivica Dimitrovski

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise