Skip to main content

2010 | Buch

Trends in Applied Intelligent Systems

23rd International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2010, Cordoba, Spain, June 1-4, 2010, Proceedings, Part II

herausgegeben von: Nicolás García-Pedrajas, Francisco Herrera, Colin Fyfe, José Manuel Benítez, Moonis Ali

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Inhaltsverzeichnis

Frontmatter

Engineering Knowledge and Semantic Systems

Improving Effectiveness of Query Expansion Using Information Theoretic Approach

Automatic Query expansion is a well-known method to improve the performance of information retrieval systems. In this paper we have suggested information theoretic measures to improve efficiency of co-occurrence based automatic query expansion. We have used pseudo relevance feedback based local approach. The expansion terms were selected from the top N documents using co-occurrence based approach. They were then ranked using two different information theoretic approaches. First one is standard

Kullback-Leibler

divergence (KLD). As a second measure we have suggested use of a variant KLD. Experiments were performed on TREC-1 dataset. The result suggests that there is a scope of improving co-occurrence based query expansion by using information theoretic measures. Extensive experiments were done to select two important parameters: number of top N documents to be used and number of terms to be used for expansion.

Hazra Imran, Aditi Sharan
Defining Coupling Metrics among Classes in an OWL Ontology

This paper aims to propose some new metrics to measure relationships among classes in an ontology. Relationships among classes in an OWL ontology are given by the object properties that are defined as a binary relation between classes in the domain with classes in the range. Our proposal is based on the coupling metric defined in the software engineering field adapted to an ontology needs. We have implemented and tested our metrics with real ontologies and the results are analysed and discussed.

Juan García, Francisco García, Roberto Therón
Enterprise 2.0 and Semantic Technologies for Open Innovation Support

In recent years, Web 2.0 has achieved a key role in the Internet community and concepts such as “the wisdom of crowds” have grown in importance in the enterprise context. Companies are adopting this paradigm for their internal processes in the so-called Enterprise 2.0. Semantic technology seems to be essential for its successful adoption with a positive return of investment. On the other hand, the introduction of the Open Innovation model, for which the innovation process should be opened out of the R&D department to all the employees and external actors, requires a technological infrastructure to be supported. In this paper we discuss how the Web 2.0 philosophy and Semantic Technology support Open Innovation and how three big European companies have profited from this new paradigm.

Francesco Carbone, Jesús Contreras, Josefa Z. Hernández
Algorithmic Decision of Syllogisms

A syllogism, also known as a rule of inference, is a formal logical scheme used to draw a conclusion from a set of premises. In a categorical syllogisms, every premise and conclusion is given in form a of quantified relationship between two objects. The syllogistic system consists of systematically combined premises and conclusions to so called figures and moods. The syllogistic system is a theory for reasoning, developed by Aristotle, who is known as one of the most important contributors of the western thought and logic. Since Aristotle, philosophers and sociologists have successfully modelled human thought and reasoning with syllogistic structures. However, a major lack was that the mathematical properties of the whole syllogistic system could not be fully revealed by now. To be able to calculate any syllogistic property exactly, by using a single algorithm, could indeed facilitate modelling possibly any sort of consistent, inconsistent or approximate human reasoning. In this paper we present such an algorithm.

Bora İ. Kumova, Hüseyin Çakır
Matching Multilingual Tags Based on Community of Lingual Practice from Multiple Folksonomy: A Preliminary Result

By taking into account various co-occurence patterns from a folksonomy, semantic correspondences between tags have been discovered and applied to a number of applications (e.g., recommendation). In this paper, we propose a novel collective intelligence application for expanding and transforming queries for searching for multilingual resources. Thereby, multilingual tags (e.g., between ‘Seoul’ in English and ‘Coree’ in French) within a folksonomy have been analyzed whether they have a significant relationship or not. We have tested the proposed multilingual tag matching method by collecting real-world tagging information from several well-known social tagging websites (e.g., Del.icio.us), and applied to translating queries to other languages without any external dictionary.

Jason J. Jung

Ensemble Learning: Methods and Applications

Multiclass Mineral Recognition Using Similarity Features and Ensembles of Pair-Wise Classifiers

Mineral determination is a basis of the petrography. Automatic mineral classification based on digital image analysis is getting very popular. To improve classification accuracy we consider similarity features, complex one stage classifiers and two-stage classifiers based on simple pair-wise classification algorithms. Results show that employment of two-stage classifiers with proper parameters or

K

class single layer perceptron are good choices for mineral classification. Similarity features with properly selected parameters allow obtaining non-linear decision boundaries and lead to sizeable decrease in classification error rate.

Rimantas Kybartas, Nurdan Akhan Baykan, Nihat Yilmaz, Sarunas Raudys
Ensembles of Probability Estimation Trees for Customer Churn Prediction

Customer churn prediction is one of the most important elements of a company’s Customer Relationship Management (CRM) strategy. In this study, two strategies are investigated to increase the lift performance of ensemble classification models, i.e. (i) using probability estimation trees (PETs) instead of standard decision trees as base classifiers, and (ii) implementing alternative fusion rules based on lift weights for the combination of ensemble member’s outputs. Experiments are conducted for four popular ensemble strategies on five real-life churn data sets. In general, the results demonstrate how lift performance can be substantially improved by using alternative base classifiers and fusion rules. However, the effect varies for the different ensemble strategies. In particular, the results indicate an increase of lift performance of (i) Bagging by implementing C4.4 base classifiers, (ii) the Random Subspace Method (RSM) by using lift-weighted fusion rules, and (iii) AdaBoost by implementing both.

Koen W. De Bock, Dirk Van den Poel
Evolving Ensembles of Feature Subsets towards Optimal Feature Selection for Unsupervised and Semi-supervised Clustering

The work in unsupervised learning centered on clustering has been extended with new paradigms to address the demands raised by real-world problems. In this regard, unsupervised feature selection has been proposed to remove noisy attributes that could mislead the clustering procedures. Additionally, semi-supervision has been integrated within existing paradigms because some background information usually exist in form of a reduced number of similarity/dissimilarity constraints. In this context, the current paper investigates a method to perform simultaneously feature selection and clustering. The benefits of a semi-supervised approach making use of reduced external information are highlighted against an unsupervised approach. The method makes use of an ensemble of near-optimal feature subsets delivered by a multi-modal genetic algorithm in order to quantify the relative importance of each feature to clustering.

Mihaela Elena Breaban
Building a New Classifier in an Ensemble Using Streaming Unlabeled Data

It is expensive and impractical to manually label all samples in real-world streaming data when the correct class is not available in real time. In this paper, we propose an ensemble method of determining which samples should be labeled from streaming unlabeled data and when they will be labeled according to changes in distribution of streaming unlabeled data. In particular, the labeling point in time is an important factor for building an efficient ensemble in practical aspects. In order to evaluate the performance of our ensemble method, we used synthetic streaming data with concept drift and the intrusion detection data from the KDD’99 Cup. We compared the results of the proposed method and those of the existing ensemble methods that periodically build new classifiers for an ensemble. In the synthetic streaming data, the proposed method produced average 14.1% higher classification accuracy, and the number of new classifiers reduced by average 12.6%. With the intrusion detection data, our method produced similar accuracy to existing methods but used only 0.007% of the labeled streaming data.

Mehmed Kantardzic, Joung Woo Ryu, Chamila Walgampaya
Random Projections for SVM Ensembles

Data projections have been used extensively to reduce input space dimensionality. Such reduction is useful to get faster results, and sometimes can help to discard unnecessary or noisy input dimensions. Random Projections (RP) can be computed faster than other methods as for example Principal Component Analysis (PCA). This paper presents an experimental study over 62 UCI datasets of three types of RPs taking into account the size of the projected space and using linear SVMs as base classifiers. We also combined random projections with sparse matrix strategy used by Rotation Forests, which is a method based in projections too. Results shows that Random Projections use to be better than using PCA for SVMs ensembles.

Jesús Maudes, Juan José Rodríguez, César García-Osorio, Carlos Pardo
Rotation Forest on Microarray Domain: PCA versus ICA

Rotation Forest (RF) is an ensemble method that has shown effectiveness on microarray data set classification problems. RF works by generating sparse rotation matrixes of the input space, a method that creates accurate and diverse base classifiers. In its original formulation, elemental rotations were obtained by Principal Component Analysis (PCA). However, for microarray data sets, Independent Component Analysis (ICA) may be a better option. In this paper, an experimental study on ten microarray data sets has been performed. The study confirms that, except for a small number of attributes, Rotation Forest outperforms Bagging and Boosting on this domain. However, RF with ICA does not generally improve on RF with PCA.

Carlos J. Alonso-González, Q. Isaac Moro-Sancho, Iván Ramos-Muñoz, M. Aránzazu Simón-Hurtado
An Empirical Study of Multilayer Perceptron Ensembles for Regression Tasks

This work presents an experimental study of ensemble methods for regression, using Multilayer Perceptrons (MLP) as the base method and 61 datasets. The considered ensemble methods are Randomization, Random Subspaces, Bagging, Iterated Bagging and AdaBoost.R2. Surprisingly, because it is in contradiction to previous studies, the best overall results are for Bagging. The cause of this difference can be the base methods, MLP instead of regression or model trees. Diversity-error diagrams are used to analyze the behaviour of the ensemble methods. Compared to Bagging, the additional diversity obtained with other methods do not compensate the increase in the errors of the ensemble members.

Carlos Pardo, Juan José Rodríguez, César García-Osorio, Jesús Maudes
Ensemble Methods and Model Based Diagnosis Using Possible Conflicts and System Decomposition

This work presents an on-line diagnosis algorithm for dynamic systems that combines model based diagnosis and machine learning techniques. The Possible Conflicts (PCs) method is used to perform consistency based diagnosis, providing fault detection and isolation. Machine learning methods are use to induce time series classifiers, that are applied on line for fault identification. The main contribution of this work is that Possible Conflicts are used to decompose the physical system, defining the input-output structure of an ensemble of classifiers. Experimental results on a simulated pilot plant show that the ensemble created from PCs decomposition has an important potential to increase the accuracy of individual classifiers for several learning algorithms. Without PCs decomposition, the best results were for another ensemble method, Stacking. These results are improved when combining Stacking with PCs decomposition.

Carlos J. Alonso-González, Juan José Rodríguez, Óscar J. Prieto, Belarmino Pulido

Evolutionary Computation and Applications

Entropy-Based Evaluation Relaxation Strategy for Bayesian Optimization Algorithm

Bayesian Optimization Algorithm (BOA) belongs to the advanced evolutionary algorithms (EA) capable of solving problems with multivariate interactions. However, to attain wide applicability in real-world optimization, BOA needs to be coupled with various efficiency enhancement techniques. A BOA incorporated with a novel entropy-based evaluation relaxation method (eBOA) is developed in this regard. Composed of an on-demand evaluation strategy (ODES) and a sporadic evaluation method, eBOA significantly reduces the number of (fitness) evaluations without imposing any larger population-sizing requirement. Experiments adduce the grounds for its significant improvement in the number of evaluations until reliable convergence. Furthermore, the evaluation relaxation does not negatively affect the scalability performance.

Hoang Ngoc Luong, Hai Thanh Thi Nguyen, Chang Wook Ahn
A New Artificial Immune System for Solving the Maximum Satisfiability Problem

In this paper we investigate the use of Artificial Immune Systems’ principles to cope with the satisfiability problem. We describe ClonSAT, a new iterative approach for solving the well known Maximum Satisfiability (Max-SAT) problem. This latter has been shown to be NP-hard if the number of variables per clause is greater than 3. The underlying idea is to harness the optimization capabilities of artificial clonal selection algorithm to achieve good quality solutions for MaxSAT problem. To foster the process, a local search has been used. The obtained results are very encouraging and show the feasibility and effectiveness of the proposed hybrid approach.

Abdesslem Layeb, Abdel Hakim Deneche, Souham Meshoul
Power-Aware Multi-objective Evolutionary Optimization for Application Mapping on NoC Platforms

Network-on-chip (NoC) are considered the next generation of communication infrastructure, which will be omnipresent in different environments. In the platform-based design methodology, an application is implemented by a set of collaborating intellectual properties (IPs) blocks. The selection of the most suited set of IPs as well as their physical mapping onto the NoC to implement efficiently the application at hand are two hard combinatorial problems. In this paper, we propose an innovative power-aware multi-objective evolutionary algorithm to perform the assignment and mapping stages of a platform-based NoC design synthesis tool. Our algorithm can use one of the well-known multi-objective evolutionary algorithms NSGA-II and microGA as kernel. The optimization is driven by the required area and the imposed execution time considering that the decision maker’s is the power consumption of the implementation.

Marcus Vinícius Carvalho da Silva, Nadia Nedjah, Luiza de Macedo Mourelle
A Discrete Differential Evolution Algorithm for Solving the Weighted Ring Arc Loading Problem

Resilient Packet Ring is a recent telecommunication transport technology that combines the appealing functionalities from Synchronous Optical Network/ Synchronous Digital Hierarchy networks with the advantages of Ethernet networks. To effectively use the RPR’s potential, namely the spatial reuse, statistical multiplexing and bi-directionality, it is necessary to route the demands efficiently. Given a set of point-to-point unidirectional traffic demands of a specified bandwidth, the demands should be assigned to the clockwise or to the counter-clockwise ring in order to yield the best performance. This paper suggests an efficient load balancing algorithm - Discrete Differential Evolution. We compare our results with the ones obtained by the Genetic Algorithm, the Differential Evolution, the Tabu Search and the Particle Swarm Optimisation, used in literature. The simulation results verify the effectiveness of the DDE.

Anabela Moreira Bernardino, Eugénia Moreira Bernardino, Juan Manuel Sánchez-Pérez, Juan Antonio Gómez-Pulido, Miguel Angel Vega-Rodríguez
A Parallel Genetic Algorithm on a Multi-Processor System-on-Chip

The aim of the work described in this paper is to investigate migration strategies for the execution of parallel genetic algorithms in a Multi-Processor System-on-Chip (MPSoC). Some multimedia and Internet applications for wireless communications are using genetic algorithms and can benefit of the advantages provided by parallel processing on MPSoCs. In order to run such algorithms, we use a Network-on-Chip platform, which provides the interconnection network required for the communication between processors. Two migration strategies are employed, in order to analyze the speedup and efficiency each one can provide, considering the communication costs they require.

Rubem Euzébio Ferreira, Luiza de Macedo Mourelle, Nadia Nedjah
The Influence of Using Design Patterns on the Process of Implementing Genetic Algorithms

The design of genetic algorithm is made on the basis of trial by error method, mainly. The aim of the research performed in this work was to examine the effects of using software design patterns in genetic algorithm implementation on the process of modifying the algorithm. Additionally, specific patterns were evaluated from the point of view of their contribution to reducing the difficulty of modifying a system.

Urszula Markowska-Kaczmar, Filip Krygowski

Fuzzy Systems and Applications

Obtaining Significant Relations in L-Fuzzy Contexts

We use linguistic variables in order to obtain significant relations in L-Fuzzy contexts (

L

,

X

,

Y

,

R

) that allow us to extract complete information from the L-Fuzzy context. We analyze, in particular, the case of anomalous values in the relation

R

of the L-Fuzzy context, proposing a replacing method in the case where they are erroneous.

Cristina Alcalde, Ana Burusco, Ramón Fuentes-González
Knowledge Extraction Based on Fuzzy Unsupervised Decision Tree: Application to an Emergency Call Center

This paper describes the application of a fuzzy version of Unsupervised Decision Tree (UDT) to the problem of an emergency call center. The goal is to obtain a decision support system that helps in the resource planning, reaching a trade-off between efficiency and quality of service. To reach this objective, the different types of days have been characterized based on variables that permits available resources assignment in an easy and understandable way. In order to deal with availability of expert knowledge on the problem, an unsupervised methodology had to be used, so fuzzy UDT is a solution merging decision trees and clustering, providing the performance of both viewpoints. Quality indexes give criteria for the selection of a reasonable solution to the complexity, as well as interpretability of the trees and the quality of generated clusters, and also the type of days and the performance from the resources point of view.

Francisco Barrientos, Gregorio Sainz
Optimization of Embedded Fuzzy Rule-Based Systems in Wireless Sensor Network Nodes

Nowadays, growing interest exists on the integration of artificial intelligence technologies, such as neural networks and fuzzy logic, into Wireless Sensor Networks. However, few attentions have been paid to integrate knowledge based systems into such networks. The objective of this work is to optimize the design of a distributed Fuzzy Rule-Based System embedded in Wireless Sensor Networks. The proposed system is composed of: a central computer, which includes a module to carry out knowledge bases edition, redundant rules reduction and transformation of knowledge bases with linguistic labels in others without labels; access point; sensor network; communication protocol; and Fuzzy Rule-Based Systems adapted to be executed in a sensor. Results have shown that, starting from knowledge bases generated by a human expert, it is possible to obtain an optimized one with a design of rules adapted to the problem, and a reduction in number of rules without a substantial decrease in accuracy. Results have shown that the use of optimized knowledge bases increases the sensor performance, decreasing their run time and battery consumption. To illustrate these results, the proposed methodology has been applied to model the behavior of agriculture plagues.

Manuel-Ángel Gadeo-Martos, Jose-Ángel Fernández-Prieto, Joaquín Canada Bago, Juan-Ramón Velasco
An Algorithm for Online Self-organization of Fuzzy Controllers

This work presents a fuzzy controller capable of designing its own structure online, based on the data obtained during the normal system operation. The algorithm does not use previous information about the differential equations that define the plant’s behaviour. The controller may be initially empty, as the method is able to distinguish the important input variables by assigning them more membership functions. With this aim, the method works in two stages: the adaptation of the consequents for every tested topology and the online addition of new membership functions. To show its capabilities, simulation results with a liquid tank system are analyzed.

Ana Belén Cara, Héctor Pomares, Ignacio Rojas
A Mechanism of Output Constraint Handling for Analytical Fuzzy Controllers

In the proposed mechanism the output constraints are handled in a relatively easy way. The method is based on prediction generation known from the MPC (Model Predictive Control) algorithms. It can be, however, used in the case of practically any analytical fuzzy controller. The big advantage of the proposed mechanism is possibility to take into consideration influence of the control action many sampling instants ahead. Therefore, the constraint handling can offer very good control performance.

Piotr M. Marusak
Analysis of the Performance of a Semantic Interpretability-Based Tuning and Rule Selection of Fuzzy Rule-Based Systems by Means of a Multi-Objective Evolutionary Algorithm

Recently, a semantic interpretability index has been proposed to preserve the semantic interpretability of Fuzzy Rule-Based Systems while a tuning of the membership functions is performed. In this work, we extend the proposed multi-objective evolutionary algorithm in order to analyze the performance of the tuning based on this semantic interpretability index while it is combined with a rule selection. To this end, the following three objectives have been considered: error and complexity minimization, and semantic interpretability maximization.

The analyzed method is compared to a single objective algorithm and to the previous approach in two problems showing that many solutions in the Pareto front dominate to those obtained by these methods.

María José Gacto, Rafael Alcalá, Francisco Herrera
Testing for Heteroskedasticity of the Residuals in Fuzzy Rule-Based Models

In this paper, we propose a new diagnostic checking tool for fuzzy rule-based modelling of time series. Through the study of the residuals in the Lagrange Multiplier testing framework we devise a hypothesis test which allows us to determine if the residual time series is homoscedastic or not, that is, if it has the same variance throughout time. This is another important step towards a statistically sound modelling strategy for fuzzy rule-based models.

José Luis Aznarte M., José M. Benítez

Heuristic Methods and Swarm Intelligence for Optimization

Heuristic Methods Applied to the Optimization School Bus Transportation Routes: A Real Case

The problem discussed in this paper is similar to the Vehicle Routing Problem (VRP), however new contributions are proposed. In this work a heuristic algorithm is proposed to determine the set of the Bus Stops. A new approach is proposed to construct digital maps containing the roads where the vehicles will be able to travel, since there are no digital maps of these regions. The real distances between the points are calculated and the heuristics Location Based Heuristic with some additional features was used to propose the new routes. The algorithm was named by Adapted Location Based Heuristic (ALBH). The School Transportation Problem was implemented in the State of Parana for 399 cities. We present here the results obtained for 10 of the 399 cities. The results obtained by using this approach showed improvement in daily distance performed and in the amount of the vehicles used to do the job.

Luzia Vidal de Souza, Paulo Henrique Siqueira
Particle Swarm Optimization in Exploratory Data Analysis

We discuss extensions of particle swarm based optimization (PSO) algorithms in the context of exploratory data analysis. In particular, we apply these extensions to principal component analysis, exploratory projection pursuit and topology preserving mappings. Our extensions include combining PSO algorithms with stochastic sampling and a form of reinforcement learning known as Q-learning. We illustrate on a variety of artificial data sets and show that our new results are better than previous results on such data sets.

Ying Wu, Colin Fyfe
Using the Bees Algorithm to Assign Terminals to Concentrators

With the recent growth of communication networks, a large variety of combinatorial optimization problems appeared. One of these problems is the Terminal Assignment Problem. The main objective is to assign a given set of terminals to a given set of concentrators. In this paper, we propose the Bees Algorithm to assign terminals to concentrators. The algorithm performs a kind of neighbourhood search and uses a local search method to locate the global minimum. The Bees Algorithm is a swam-based optimization algorithm that mimics the natural behaviour of honey bees. We show that the Bees Algorithm is able to achieve feasible solutions to Terminal Assignment instances, improving the results obtained by previous approaches.

Eugénia Moreira Bernardino, Anabela Moreira Bernardino, Juan Manuel Sánchez-Pérez, Juan Antonio Gómez-Pulido, Miguel Angel Vega-Rodríguez
Multicriteria Assignment Problem (Selection of Access Points)

This paper addresses assignment of users to access points of wireless network. The considered problem is based on multicriteria assignment model. A set of examined criteria involves the following: (i) maximum of bandwidth, (ii) number of users which are under service at the same time, (iii) network reliability requirements, etc. Two kinds of resource constraints are examined: (a) the number of users under service for each access point, (b) frequency bandwidth that is provided by each access point. The considered optimization problem is NP-hard and heuristic is proposed. Numerical examples illustrate the approach.

Mark Sh. Levin, Maxim V. Petukhov
Composite Laminates Buckling Optimization through Lévy Based Ant Colony Optimization

In this paper, the authors propose the use of the Lévy probability distribution as leading mechanism for solutions differentiation in an efficient and bio-inspired optimization algorithm, ant colony optimization in continuous domains, ACOR. In the classical ACOR, new solutions are constructed starting from one solution, selected from an archive, where Gaussian distribution is used for parameter diversification. In the proposed approach, the Lévy probability distributions are properly introduced in the solution construction step, in order to couple the ACOR algorithm with the exploration properties of the Lévy distribution. The proposed approach has been tested on mathematical test functions and on a real world problem of structural engineering, the composite laminates buckling load maximization. In the latter case, as in many other cases in real world problems, the function to be optimized is multi-modal, and thus the exploration ability of the Levy perturbation operator allow the attainment of better results.

Roberto Candela, Giulio Cottone, Giuseppe Fileccia Scimemi, Eleonora Riva Sanseverino
Teaching Assignment Problem Solver

In this paper, we describe an extension approach to the backtracking with look-ahead forward checking method that adopts weighted partial satisfaction of soft constraints that has been implemented to the development of an automated teaching assignment timetabling system. Determining the optimal solution for a teaching assignment problem is a challenging task. The objective is to construct a timetable for professors from already scheduled courses that satisfy both hard constraints (problem requirements such as no teacher should be assigned two courses at the same time) and soft constraints (teacher preferences) based on fairness principle in distributing courses among professors. The approach is done mainly to modify the variable selection method and the value assignment technique taking into account preferences and based on fairness principle. The optimized look-ahead backtracking method applied to the solution is presented and discussed along with computational results.

Ali Hmer, Malek Mouhoub
Swarm Control Designs Applied to a Micro-Electro-Mechanical Gyroscope System (MEMS)

This paper analyzes the non-linear dynamics of a MEMS Gyroscope system, modeled with a proof mass constrained to move in a plane with two resonant modes, which are nominally orthogonal. The two modes are ideally coupled only by the rotation of the gyro about the plane’s normal vector. We demonstrated that this model has an unstable behavior. Control problems consist of attempts to stabilize a system to an equilibrium point, a periodic orbit, or more general, about a given reference trajectory. We also developed a particle swarm optimization technique for reducing the oscillatory movement of the nonlinear system to a periodic orbit.

Fábio Roberto Chavarette, José Manoel Balthazar, Ivan Rizzo Guilherme, Orlando Saraiva do Nascimento Jr.

Industrial Applications of Data Mining: New Paradigms for New Challenges

A Representation to Apply Usual Data Mining Techniques to Chemical Reactions

Chemical reactions always involve several molecules of two types, reactants and products. Existing data mining techniques, eg. Quantitative Structure Activity Relationship (QSAR) methods, deal with individual molecules only. In this article, we propose to use Condensed Graph of Reaction (CGR) approach merging all molecules involved in a reaction into one molecular graph. This allows one to consider reactions as pseudo-molecules and to develop QSAR models based on fragment descriptors. Here ISIDA fragment descriptors calculated from CGRs have been used to build quantitative models for the rate constant of

$\mbox{S}_{\mbox{N}}\mbox{2}$

reactions in water. Three common attribute-value regression algorithms (linear regression, support vector machine, and regression trees) have been evaluated.

Frank Hoonakker, Nicolas Lachiche, Alexandre Varnek, Alain Wagner
Incident Mining Using Structural Prototypes

Software and other technical products offered to a mass market have a high demand on support and help desks. A tool for automated classification of incident reports, errors and other customer requests which offers previous (successful) hints or solution procedures could efficiently decrease support costs. We propose an approach to mining incidents and other customer requests for support based on generalising structural prototypes from structured data. Retrieval can then be efficiently realised by matching incoming requests against prototypes. We present an application to incident reports in an SAP business information system. Several variants of structure generalisation algorithms were realised and performance for an example test base was evaluated with promising results.

Ute Schmid, Martin Hofmann, Florian Bader, Tilmann Häberle, Thomas Schneider
Viability of an Alarm Predictor for Coffee Rust Disease Using Interval Regression

We present a method to formulate predictions regarding continuous variables using regressors able to predict intervals rather than single points. They can be learned explicitly using the so-called insensitive zone of regression Support Vector Machines (SVM). The motivation for this research is the study of a real case; we discuss the feasibility of an alarm system for coffee rust, the main coffee crop disease in the world. The objective is to predict whether the percentage of infected coffee leaves (the incidence of the disease) will be above a given threshold. The requirements of such a system include avoiding false negatives, seeing as these would lead to not preventing the disease. The aim of reliable predictions, on the other hand, is to use chemical prevention of the disease only when necessary in order to obtain healthier products and reductions in costs and environmental impact. Although the breadth of the predicted intervals improves the reliability of predictions, it also increases the number of uncertain situations, i.e. those whose predictions include incidences both below and above the threshold. These cases would require deeper analysis. Our conclusion is that it is possible to reach a trade-off that makes the implementation of an alarm system for coffee rust disease feasible.

Oscar Luaces, Luiz Henrique A. Rodrigues, Carlos Alberto Alves Meira, José R. Quevedo, Antonio Bahamonde
Prediction of Web Goodput Using Nonlinear Autoregressive Models

The performance prediction is a key part of the modern network traffic engineering. In this paper we present the application of nonlinear autoregressive modeling to the prediction of goodput level in web transactions. We propose the two-stage approach, with clustering step on historical data, prior to classification, to determine the most appropriate traffic intensity levels. Our study is based on the data collected by the

MWING

system, an ensemble of web performance measurement agents, and cover over a year of continuous observations of a group of HTTP servers.

Maciej Drwal, Leszek Borzemski
Domain Driven Data Mining for Unavailability Estimation of Electrical Power Grids

In Brazil, power generating, transmitting and distributing companies operating in the regulated market are paid for their equipment availability. In case of system unavailability, the companies are financially penalized, more severely, on unplanned interruptions. This work presents a domain driven data mining approach for estimating the risk of systems’ unavailability based on their component equipments historical data, within one of the biggest Brazilian electric sector companies. Traditional statistical estimators are combined with the concepts of Recency, Frequency and Impact (RFI) for producing variables containing behavioral information finely tuned to the application domain. The unavailability costs are embedded in the problem modeling strategy. Logistic regression models bagged via their median score achieved Max_KS=0.341 and AUC_ROC=0.699 on the out-of-time data sample. This performance is much higher than the previous approaches attempted within the company. The system has been put in operation and will be monitored for the performance re-assessment and maintenance re-planning.

Paulo J. L. Adeodato, Petrônio L. Braga, Adrian L. Arnaud, Germano C. Vasconcelos, Frederico Guedes, Hélio B. Menezes, Giorgio O. Limeira

Intelligent Agent-Based Systems

Social Order in Hippocratic Multi-Agent Systems

In multi-agent applications, users delegate their sensitive data to autonomous agents that interact with other autonomous agents. In this context, privacy preservation is an important topic. In previous work, considering this problem, we have proposed the Hippocratic Multi-Agent System model (HiMAS). In this paper, we focus on the regulation of agents behavior with respect to privacy management in this model. We present a social order approach based on trust and reputation that install a decentralised regulation of privacy management in HiMAS systems.

Ludivine Crépin, Yves Demazeau, Olivier Boissier, François Jacquenet
Building an Electronic Market System

An electronic market system is predicated on three technologies: data mining, intelligent trading agents and virtual institutions in which informed trading agents can trade securely both with each other and with human agents in a natural way. This paper describes a demonstrable prototype electronic market that integrates these three technologies and is available on the World Wide Web. This is part of a larger project that aims to make informed automated trading a reality.

Elaine Lawrence, John Debenham
Information Theory Based Intelligent Agents

Electronic and mobile trading environments are saturated with information from the Internet and the World Wide Web. This paper proposes agents that can assimilate and use real-time information flows wisely to automate contract negotiation reliably. A new breed of “information-based” agents are founded on concepts from information theory, and are designed to operate with information flows of varying and questionable integrity. These agents are part of a larger project that aims to make informed automated trading in applications such as electronic or mobile procurement a reality.

Elaine Lawrence, John Debenham
A Possibilistic Approach to Goal Generation in Cognitive Agents

We propose a theoretical framework, grounded in possibility theory, to account for all the aspects involved in representing and changing beliefs, representing and generating justified desires, and selecting goals based on current beliefs about the world and the preferences of an agent.

Célia da Costa Pereira, Andrea G. B. Tettamanzi
Modelling Greed of Agents in Economical Context

A classical debate in economics addresses the advantages and drawbacks of modelling from a macroeconomics perspective as opposed to modelling from a microeconomics perspective. Form the latter psychological aspects at an individual level can be taken into account in a differentiated manner. Within computer science and AI, a similar debate exists about the differences between agent-based and population-based modelling. This paper aligns both debates by exploring the differences and commonalities between population-based and agent-based modelling in economical context. A case study is performed on the interplay between individual greed as a psychological concept and global economical concepts. It is shown that under certain conditions agent-based and population-based simulations show similar results.

Tibor Bosse, Ghazanfar F. Siddiqui, Jan Treur
Modeling and Verifying Agent-Based Communities of Web Services

Communities of web services are virtual spaces that can dynamically gather different web services having complementary functionalities in order to provide composite services. In the last two years, some approaches have been proposed using multi-agent systems to organize communities of web services. This trend has increased the flexibility but also the system complexity. The system becomes hard to check by simply inspecting its model. Therefore, model checking, which is a well-established formal technique for verifying communication and cooperation in multi-agent systems, is used in this paper to verify the system correctness in terms of satisfying desirable properties. The approach presented in the paper is used to verify communities of web services modeled in UML activity diagram. We first translate the activity diagram into an interpreted system model using predefined transformation rules. Specifications are expressed as formulae in a logic extending the Computation Tree Logic

CTL

*

with agent commitments needed for their communication. Then, both the model and formulae are used as inputs for the multi-agent symbolic model checker MCMAS. We illustrate our approach with a short case study, in which we show how communication properties of simulated communities are verified.

Wei Wan, Jamal Bentahar, Abdessamad Ben Hamza

Interactive and Cognitive Environments

An Ambient Intelligent Agent Model Based on Behavioural Monitoring and Cognitive Analysis

This paper proposes a way in which cognitive models can be exploited in practical applications in the context of Ambient Intelligence. A computational model is introduced in which a cognitive model that addresses some aspects of human functioning is taken as a point of departure. From this cognitive model relationships between cognitive states and behavioural aspects affected by these states are determined. Moreover, representation relations for cognitive states are derived, relating them to external events such as stimuli that can be monitored. Furthermore, by automatic verification of the representation relations on monitoring information the occurrence of cognitive states affecting the human behaviour is determined. In this way the computational model is able to analyse causes of behaviour.

Alexei Sharpanskykh, Jan Treur
The Combination of a Causal and Emotional Learning Mechanism for an Improved Cognitive Tutoring Agent

This paper describes a Conscious Tutoring System (CTS) capable of dynamic fine-tuned assistance to users. We put forth the combination of a Causal Learning and Emotional learning mechanism within CTS that will allow it to first establish, through data mining algorithms, gross user group models. CTS then uses these models to find the cause of mistakes made by users, evaluate their performance, predict their future behavior, and, through a Pedagogical Knowledge mechanism, decide which tutoring intervention fits best.

Usef Faghihi, Philippe Fouriner-viger, Roger Nkambou, Pierre Poirier
Driver’s Behavior Assessment by On-board/Off-board Video Context Analysis

In the last few years, the application of ICT technologies in automotive field has taken an increasing role in improving both the safety and the driving comfort. In this context, systems capable of determining the traffic situation and/or driver behavior through the analysis of signals from multiple sensors (e.g. radar, cameras, etc...) are the subject of active research in both industrial and academic sectors. The extraction of contextual information through the analysis of video streams captured by cameras can therefore have implications in many applications focused both on prevention of incidents and on provision of useful information to drivers. In this paper, we investigate the study and implementation of algorithms for the extraction of context data from on-board cameras mounted on vehicles. A camera is oriented so as to frame the portion of road in front of the vehicle while the other one is positioned inside the vehicle and pointed on the driver.

Lorenzo Ciardelli, Andrea Beoldo, Francesco Pasini, Carlo Regazzoni
An eHealth System for a Complete Home Assistance

Home telecare systems are improving the current level of quality in healthcare services. This paper describes an eHealth system designed to support people living in their homes. The approach introduces a flexible system architecture that is running on a common residential gateway. The architecture provides basic services and openness to integrate dedicated telecare services. Special attention is paid to the integration of the patient’s relatives and friends. For that purpose, a videoconference system allows any participant to show information about the availability, current status, to communicate face-to-face or in a discussion together with other members (e.g. patient, nurse, doctor and relatives). A module to maintain medical appointments has been integrated as well.

Jaime Martín, Mario Ibañez, Natividad Martínez Madrid, Ralf Seepold
Tracking System Based on Accelerometry for Users with Restricted Physical Activity

This article aims to develop a minimally intrusive system of care and monitoring. Furthermore, the goal is to get a cheap, comfortable and, especially, efficient system which controls the physical activity carried out by the user. All this, is based on the data of accelerometry analysis which are obtained through a mobile phone.

Besides this, we will develop a comprehensive system for consulting the activity obtained in order to provide families and care staff an interface through which to observe the condition of the individual subject to monitoring.

L. M. Soria-Morillo, Juan Antonio Álvarez-García, Juan Antonio Ortega, Luis González-Abril

Internet Applications

Web Query Reformulation Using Differential Evolution

This paper presents a query reformulation and clustering technique using

Differential Evolution.

Differential evolution (DE)

has emerged as one of the fast, robust, and efficient global search heuristics of current interest. The proposed

DE

automatically determines the type of a query and new pattern of query reformulation.

Prabhat K. Mahanti, Mohammad Al-Fayoumi, Soumya Banerjee, Feras Al-Obeidat
On How Ants Put Advertisements on the Web

Advertising is an important aspect of the Web as many services rely on it for continued viability. This paper provides insight into the effectiveness of using ant-inspired algorithms to solve the problem of Internet advertising. The paper is motivated by the success of collaborative filtering systems and the success of ant-inspired systems in solving data mining and complex classification problems. Using the vector space formalism, a model is proposed that learns to associate ads with pages with no prior knowledge of users’ interests. The model uses historical data from users’ click-through patterns in order to improve associations. A test bed and experimental methodology is described, and the proposed model evaluated using simulation. The reported results clearly show that significant improvements in ad association performance are achievable.

Tony White, Amirali Salehi-Abari, Braden Box
Mining Association Rules from Semantic Web Data

The amount of ontologies and semantic annotations available on the Web is constantly increasing. This new type of complex and heterogeneous graph-structured data raises new challenges for the data mining community. In this paper, we present a novel method for mining association rules from semantic instance data repositories expressed in RDF/S and OWL. We take advantage of the schema-level (i.e.

Tbox

) knowledge encoded in the ontology to derive just the appropriate transactions which will later feed traditional association rules algorithms. This process is guided by the analyst requirements, expressed in the form of a query pattern. Initial experiments performed on real world semantic data enjoy promising results and show the usefulness of the approach.

Victoria Nebot, Rafael Berlanga
Hierarchical Topic-Based Communities Construction for Authors in a Literature Database

In this paper, given a set of research papers with only title and author information, a mining strategy is proposed to discover and organize the communities of authors according to both the co-author relationships and research topics of their published papers. The proposed method applies the CONGA algorithm to discover collaborative communities from the network constructed from the co-author relationship. To further group the collaborative communities of authors according to research interests, the CiteSeer

X

is used as an external source to discover the hidden hierarchical relationships among the topics covered by the papers. In order to evaluate whether the constructed topic-based collaborative community is semantically meaningful, the first part of evaluation is to measure the consistency between the terms appearing in the published papers of a topic-based collaborative community and the terms in the documents related to the specific topic retrieved from other external source. The experimental results show that 81.61% of the topic-based collaborative communities satisfy the consistency requirement. On the other hand, the accuracy of the discovered sub-concept relationship is verified by checking the Wikipedia categories. It is shown that 75.96% of the sub-concept terms are properly assigned in the concept hierarchy.

Chien-Liang Wu, Jia-Ling Koh
Generating an Event Arrangement for Understanding News Articles on the Web

We propose a new event arrangement system for Web news browsing based on analyzing past news articles related to the browsing-targeted article. Since relevant events are important for understanding news articles, we propose an event arrangement system based on making a connection between the relevant events and providing sequences of those events. When a user chooses an event from candidate events extracted on the basis of time series and important words, the system generates other events related to the chosen one. The system enables a user to find topic sequences suiting one’s interest and to closely understand news articles.

Norifumi Hirata, Shun Shiramatsu, Tadachika Ozono, Toramatsu Shintani
Architecture for Automated Search and Negotiation in Affiliation among Community Websites and Blogs

In this paper, we present a multi-agent architecture which can reduce user’s load when searching for affiliates in a network of community websites. We give a precise definition of the environment, networks of community websites. The system’s architecture is designed with scalability and easy interfacing for brokers and matchmakers in mind. We have developed a simulator to see how sites or blogs evolve with affiliation. We also show an example of its output results after a 1500-iteration long experiment on a network of community blogs. In our conclusion, we state several applications of critical interest and further research paths on the subject.

Robin M. E. Swezey, Masato Nakamura, Shun Shiramatsu, Tadachika Ozono, Toramatsu Shintani

Knowledge Management and Knowledge Based Systems

Effect of Semantic Differences in WordNet-Based Similarity Measures

Assessing the semantic similarity of words is a generic problem in many research fields such as artificial intelligence, biomedicine, linguistics, cognitive science and psychology. The difficulty of this task lies in how to find an effective way to simulate the process of human judgement of word similarity. In this paper, we introduce the idea of semantic differences and commonalities between words to the similarity computation process. Five new semantic similarity metrics are obtained after applying this scheme to traditional WordNet-based measures. In an experimental evaluation of our approach on a standard 28 word pairs dataset, three of the measures outperformed their classical version, while the other two performed as well as their unmodified counterparts.

Raúl Ernesto Menéndez-Mora, Ryutaro Ichise
An Ontological Representation of Documents and Queries for Information Retrieval Systems

This paper presents a vector space model approach, for representing documents and queries, using concepts instead of terms and WordNet as a light ontology. This way, information overlap is reduced with respect to the classic semantic expansion techniques. Experiments carried out on the MuchMore benchmark showed the effectiveness of the approach.

Mauro Dragoni, Célia Da Costa Pereira, Andrea G. B. Tettamanzi
Predicting the Development of Juvenile Delinquency by Simulation

A large number of delinquent activities are performed by adolescents and only occur during this period in their lives. One of the main factors that influence this behaviour is social interaction, mainly with peers. This paper contributes a computational model that predicts delinquent behaviour during adolescence based on interaction with friends and classmates. Based on the model, which was validated based on empirical data, the level of delinquency of pupils is simulated over time. Furthermore, simulation experiments are performed to investigate for hypothetical scenarios what is the impact of the division of students over classes on the (individual and collective) level of delinquency.

Tibor Bosse, Charlotte Gerritsen, Michel C. A. Klein
Building and Analyzing Corpus to Investigate Appropriateness of Argumentative Discourse Structure for Facilitating Consensus

Clarifying characteristics of

appropriate

argumentative discourse is important for developing computer assisted argumentation systems. We describe the analysis of argumentative discourse structure on the basis of Rhetorical Structure Theory in order to clarify what kind of argumentative discourse structure should be considered

appropriate

. We think that there exist specific agreement-oriented sequences of rhetorical relations in argumentative discourse that tend to lead to an agreement. We build a small argumentative corpus annotated with rhetorical relations and calculate posteriori probability for rhetorical relations bigrams to investigate what rhetorical relations precede

agreement

.

Tatiana Zidrasco, Shun Shiramatsu, Jun Takasaki, Tadachika Ozono, Toramatsu Shintani
Improving Identification Accuracy by Extending Acceptable Utterances in Spoken Dialogue System Using Barge-in Timing

We describe a novel dialogue strategy enabling robust interaction under noisy environments where automatic speech recognition (ASR) results are not necessarily reliable. We have developed a method that exploits utterance timing together with ASR results to interpret user intention, that is, to identify one item that a user wants to indicate from system enumeration. The timing of utterances containing referential expressions is approximated by Gamma distribution, which is integrated with ASR results by expressing both of them as probabilities. In this paper, we improve the identification accuracy by extending the method. First, we enable interpretation of utterances including ordinal numbers, which appear several times in our data collected from users. Then we use proper acoustic models and parameters, improving the identification accuracy by 4.0% in total. We also show that Latent Semantic Mapping (LSM) enables more expressions to be handled in our framework.

Kyoko Matsuyama, Kazunori Komatani, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
A New Approach to Construct Optimal Bow Tie Diagrams for Risk Analysis

Bow tie diagrams have become popular methods in risk analysis and safety management. This tool describes graphically, in the same scheme, the whole scenario of an identified risk and its respective preventive and protective barriers. The major problem with bow tie diagrams is that they remain limited by their technical level and by their restriction to the graphical representation of different scenarios without any consideration to the dynamic aspect of real systems. This paper overcomes this weakness by proposing a new Bayesian approach to construct bow ties from real data.

Ahmed Badreddine, Nahla Ben Amor

Machine Learning

Feature Selection and Occupancy Classification Using Seismic Sensors

In this paper, we consider the problem of indoor surveillance and propose a feature selection scheme for occupancy classification in an indoor environment. The classifier aims to determine whether there is exactly one occupant or more than one occupant. Data are obtained from six seismic sensors (geophones) that are deployed in a typical building hallway. Four proposed features exploit amplitude and temporal characteristics of the seismic time series. A neural network classifier achieves performance ranging between 77% to 95% on the test data, depending on the type of construction of the location in the building being monitored.

Arun Subramanian, Kishan G. Mehrotra, Chilukuri K. Mohan, Pramod K. Varshney, Thyagaraju Damarla
Extending Metric Multidimensional Scaling with Bregman Divergences

We investigate multidimensional scaling with Bregman divergences and show that the Sammon mapping can be thought of as a truncated Bregman multidimensional scaling (BMDS). We show that the full BMDS improves upon the Sammon mapping on some standard data sets and investigate the reasons underlying this improvement. We then introduce two families of BMDS which use opposite strategies to create good mappings of standard data sets and investigate these opposite strategies analytically.

Jigang Sun, Malcolm Crowe, Colin Fyfe
Independent Component Analysis Using Bregman Divergences

We review the technique of independent component analysis (ICA) and Stone’s criterion for performing an independent component analysis. We then review Bregman divergences and show how they may be applied to Stone’s criterion providing a very simple algorithm for performing ICA. We illustrate our method on two very simple data sets.

Xi Wang, Colin Fyfe
Novel Method for Feature-Set Ranking Applied to Physical Activity Recognition

Considerable attention is recently being paid in e-health and e-monitoring to the recognition of motion, postures and physical exercises from signal activity analysis. Most works are based on knowledge extraction using features which permit to make decisions about the activity realized, being feature selection the most critical stage. Feature selection procedures based on wrapper methods or ‘branch and bound’ are highly computationally expensive. In this paper, we propose an alternative filter method using a feature-set ranking via a couple of two statistical criteria, which achieves remarkable accuracy rates in the classification process.

Oresti Baños, Héctor Pomares, Ignacio Rojas
Time Space Tradeoffs in GA Based Feature Selection for Workload Characterization

This paper reports the results of a research effort that explores time/space tradeoffs inherent to genetic algorithms (GA). The study analyzes redundancy in the GA search space and lays out a schema for efficient utilization of record keeping in the form of a cache to minimize redundancy. The application used for evaluation of the record keeping procedure is feature selection for computer workload characterization. The experimental results demonstrate the utility of record keeping in the GA domain, and show a significant reduction in execution time with virtually the same solution quality.

Dan E. Tamir, Clara Novoa, Daniel Lowell
Learning Improved Feature Rankings through Decremental Input Pruning for Support Vector Based Drug Activity Prediction

The use of certain machine learning and pattern recognition tools for automated pharmacological drug design has been recently introduced. Different families of learning algorithms and Support Vector Machines in particular have been applied to the task of associating observed chemical properties and pharmacological activities to certain kinds of representations of the candidate compounds. The purpose of this work, is to select an appropriate feature ordering from a large set of molecular descriptors usually used in the domain of Drug Activity Characterization. To this end, a new input pruning method is introduced and assessed with respect to commonly used feature ranking algorithms.

Wladimiro Díaz-Villanueva, Francesc J. Ferri, Vicente Cerverón
Scaling Up Feature Selection by Means of Democratization

The overwhelming amount of data that is available nowadays makes many of the existing machine larning algorithms inapplicable to many real-world problems. Two approaches have been used to deal with this problem: scaling up data mining algorithms [1] and data reduction. Nevertheless, scaling up a certain algorithm is not always feasible. One of the most common methods for data reduction is feature selection, but when we face large problems, the scalability becomes an issue. This paper presents a way of removing this difficulty using several rounds of feature selection on subsets of the original dataset, combined using a voting scheme. The performance is very good in terms of testing error and storage reduction, while the execution time of the process is decreased very significantly. The method is especially efficient when we use feature selection algorithms that are of a high computational cost. An extensive comparison in 27 datasets of medium and large sizes from the UCI Machine Learning Repository and using different classifiers shows the usefulness of our method.

Aida de Haro-García, Nicolás García-Pedrajas
Backmatter
Metadaten
Titel
Trends in Applied Intelligent Systems
herausgegeben von
Nicolás García-Pedrajas
Francisco Herrera
Colin Fyfe
José Manuel Benítez
Moonis Ali
Copyright-Jahr
2010
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-13025-0
Print ISBN
978-3-642-13024-3
DOI
https://doi.org/10.1007/978-3-642-13025-0

Premium Partner