Invited Papers

Randomization Methods for Assessing the Significance of Data Mining Results

Data mining research has developed many algorithms for various analysis tasks on large and complex datasets. However, assessing the significance of data mining results has received less attention. Analytical methods are rarely available, and hence one has to use computationally intensive methods. Randomization approaches based on null models provide, at least in principle, a general approach that can be used to obtain empirical p-values for various types of data mining approaches. I review some of the recent work in this area, outlining some of the open questions and problems.

Heikki Mannila

Dealing with Music in Intelligent Ways

Music is not just a product of human creativity and a uniquely human means of expression, it is also a commodity of great commercial relevance. The rapid digitisation of the music market, with the global availability of ever larger amounts of music, is creating a need for musically intelligent computer systems, and lots of opportunities for exciting research.

The presentation gives an impression of latest research in the field of intelligent music processing and music information retrieval. Based (mostly) on our own recent work, I discuss what it means for a computer to be musically intelligent, describe some of the techniques that are being developed, and demonstrate how entirely new musical interfaces and devices become possible with such methods – devices that, in effect, will change the way we listen to, and interact with, music.

Gerhard Widmer

Intelligent Systems: From R.U.R. to ISMIS 2009 and beyond

History of humans trying to understand and provide foundations for human reasoning or intelligence can be traced back to 11

th

century BC Babylonia reasoning in medical diagnostics and astronomy, Chinese tradition of thought that blossomed between 10

th

and 6

th

centuries BC and produced Analects of Confucius, 6

th

through 2

nd

century BC Indian philosophy and Hindu schools of thought, 4

th

century BC Greece where Aristotle gave birth to formal logic, and the quest continues into 21

st

century AD.

Humans have also been keen on creating “artificial life” or “intelligent systems”. The Old Testament that was created between the 12

th

and the 2

nd

century BC mentions a “servant” made from clay called Golem. However, you are now in Prague, where according to the legend the 16

th

century the chief Rabbi Loew constructed the Golem out of clay from the banks of Vltava (Moldau) river and brought it to life to protect the Jewish Ghetto. However, Golem grew it became increasingly violent, spreading fear, killing and eventually turning on its creator. It is believed that the first record of a “homunculus” (representation of a human being) appeared in alchemical literature in the 3

rd

century AD. A different branch of inquiry aimed at building “automata”, or self-operating artificial systems that have been made to resemble human or animal actions. These can be traced to 3

rd

century BC China, Greek Antikythera mechanism built about 150-100 BC, 8

th

century Muslim alchemist, Jabir ibn Hayyan (Geber) who in his coded Book of Stones included recipes for constructing artificial snakes, scorpions, and humans.

In 1921, a Czech writer Karel Capek gave birth to the word “robot” in his science fiction play “R.U.R.” (Rossum’s Universal Robots). Today, robots produce cars, explore the universe, search for victims in natural disasters, perform delicate surgeries, harvest crops, and cuddly robots with capabilities to talk are companions to elderly and children - as portrayed in another Czech science fiction from 1962 “Kybernetická Babicka” (Cybernetic Grandmother) by Jirí Trnka - a puppet maker, illustrator, motion-picture animator and film director.

The quest to understand intelligence and create intelligent systems has a history of 32 centuries, while the International Symposium on Methodologies for Intelligent System that originated in 1986 has as of this year (2009) devoted 23 years to advancing the state of the art in our understanding intelligent behavior - both human and machine, and developing more effective intelligent machine systems or human-machine systems.

In the early years, ISMIS focused on areas of Approximate Reasoning, Expert Systems, Intelligent Databases, Knowledge Representation, Learning and Adaptive Systems, Logic for Artificial Intelligence, and Man-Machine Interaction. ISMIS 2009 called for papers in the areas of Active Media Human-Computer Interaction, Autonomic and Evolutionary Computation, Digital Libraries, Intelligent Agent Technology, Intelligent Information Retrieval, Intelligent Information Systems, Intelligent Language Processing, Knowledge Representation and Integration, Knowledge Discovery and Data Mining, Knowledge Visualization, Logic for Artificial Intelligence, Music Information Retrieval, Soft Computing, Text Mining, Web Intelligence, Web Mining, and Web Services.

Have we solved the problems that are not in ISIMS 2009 focus? Where do we stand? What have we learned? What are we teaching? Where should we focus our attention in the years beyond 2009? This talk will outline open research problems, educational and other societal issues, and opportunities for cooperation in methodologies for intelligent systems.

Maria Zemankova

The Art of Management and the Technology of Knowledge-Based Systems

Explicit knowledge is successfully transferrable into computers. As the consequence of this, we have today at hand various knowledge and expert systems. The talk provides a short overview of some basic steps towards the actual situation. Then it focuses to the role of management for effective dealing with knowledge, and to the role of special kind of knowledge – the knowledge of management. A new type of knowledge storing and processing technology, resulting in specific type of knowledge-based systems – the

Knowledge Managing Systems

– is proposed as a computer-based support for activities which form at least some part of the Art of Management.

Jozef Kelemen, Ivan Polášek

Knowledge Discovery and Data Mining

Frequent Itemset Mining in Multirelational Databases

This paper proposes a new approach to mine multirelational databases. Our approach is based on the representation of a multirelational database as a set of trees. Tree mining techniques can then be applied to identify frequent patterns in this kind of databases. We propose two alternative schemes for representing a multirelational database as a set of trees. The frequent patterns that can be identified in such set of trees can be used as the basis for other multirelational data mining techniques, such as association rules, classification, or clustering.

Aída Jiménez, Fernando Berzal, Juan-Carlos Cubero

A Multiple Scanning Strategy for Entropy Based Discretization

We present results of experiments performed on 14 data sets with numerical attributes using a novel technique of discretization called multiple scanning. Multiple scanning is based on scanning all attributes of the data set many times, during each scan the best cut-points are found for all attributes. Results of our experiments show that multiple scanning enhances successfully, in terms of the error rate, an ordinary discretization technique based on conditional entropy.

Jerzy W. Grzymala-Busse

Fast Subgroup Discovery for Continuous Target Concepts

Subgroup discovery is a flexible data mining method for a broad range of applications. It considers a given property of interest (target concept), and aims to discover interesting subgroups with respect to this concept. In this paper, we especially focus on the handling of continuous target variables and describe an approach for fast and efficient subgroup discovery for such target concepts. We propose novel formalizations of effective pruning strategies for reducing the search space, and we present the SD-Map* algorithm that enables fast subgroup discovery for continuous target concepts. The approach is evaluated using real-world data from the industrial domain.

Martin Atzmueller, Florian Lemmerich

Discovering Emerging Graph Patterns from Chemicals

Emerging patterns are patterns of a great interest for characterizing classes. This task remains a challenge, especially with graph data. In this paper, we propose a method to mine the whole set of frequent emerging graph patterns, given a frequency threshold and an emergence threshold. Our results are achieved thanks to a change of the description of the initial problem so that we are able to design a process combining efficient algorithmic and data mining methods. Experiments on a real-world database composed of chemicals show the feasibility and the efficiency of our approach.

Guillaume Poezevara, Bertrand Cuissart, Bruno Crémilleux

Visualization of Trends Using RadViz

Data mining is sometimes treating data consisting of items representing measurements of a single property taken in different time points. In this case data can be understood as a time series of one feature. It is no exception when the clue for evaluation of such data is related to their development trends as observed in several successive time points.

From the qualitative point of view one can distinguish 3 basic types of behavior between two neighboring time points: the value of the feature is stable (remains the same), it grows or it falls. This paper is concerned with identification of typical qualitative development patterns as they appear in the windows of given length in the considered time-stamped data and their utilization for specification of interesting subgroups.

Lenka Nováková, Olga Štěpánková

Action Rules Discovery Based on Tree Classifiers and Meta-actions

Action rules describe possible transitions of objects from one state to another with respect to a distinguished attribute. Early research on action rule discovery usually required the extraction of classification rules before constructing any action rule. Newest algorithms discover action rules directly from a decision system. To our knowledge, all these algorithms assume that all attributes are symbolic or require prior discretization of all numerical attributes. This paper presents a new approach for generating action rules from datasets with numerical attributes by incorporating a tree classifier and a pruning step based on meta-actions. Meta-actions are seen as a higher-level knowledge (provided by experts) about correlations between different attributes.

Zbigniew W. Raś, Agnieszka Dardzińska

Action Rules and the GUHA Method: Preliminary Considerations and Results

The paper presents an alternative approach to action rules. The presented approach is based on experience with the GUHA method and the LISp-Miner system. G-action rules are introduced. First experience with new GUHA procedure Ac4ft-Miner that mines for G-action rules is described.

Jan Rauch, Milan Šimůnek

Semantic Analytical Reports: A Framework for Post-processing Data Mining Results

Intelligent post-processing of data mining results can provide valuable knowledge. In this paper we present the first systematic solution to post-processing that is based on semantic web technologies. The framework input is constituted by PMML and description of background knowledge. Using the Topic Maps formalism, a generic Data Mining ontology and Association Rule Mining ontology were designed. Through combination of a content management system and a semantic knowledge base, the analyst can enter new pieces of information or interlink existing ones. The information is accessible either via semi-automatically authored textual analytical reports or via semantic querying. A prototype implementation of the framework for generalized association rules is demonstrated on the PKDD’99 Financial Data Set.

Tomáš Kliegr, Martin Ralbovský, Vojtěch Svátek, Milan Šimůnek, Vojtěch Jirkovský, Jan Nemrava, Jan Zemánek

Applications of Intelligent Systems in Medicine

Medical Decision Making through Fuzzy Computational Intelligent Approaches

A new approach for the construction of Fuzzy Cognitive Maps augmented by knowledge through fuzzy rule-extraction methods for medical decision making is investigated. This new approach develops an augmented Fuzzy Cognitive Mapping based Decision Support System combining knowledge from experts and knowledge from data in the form of fuzzy rules generated from rule-based knowledge discovery methods. Fuzzy Cognitive Mapping (FCM) is a fuzzy modeling methodology based on exploiting knowledge and experience from experts. The FCM accompanied with knowledge extraction and computational intelligent techniques, contribute to the development of a decision support system in medical informatics. The proposed approach is implemented in a well-known medical problem for assessment of treatment planning decision process in radiotherapy.

Elpiniki I. Papageorgiou

Fuzzy Cognitive Map Based Approach for Assessing Pulmonary Infections

The decision making problem of predicting infectious diseases is a complex process, because of the numerous elements/parameters (such as symptoms, signs, physical examination, laboratory tests, cultures, chest x-rays, e.t.c.) involved in its operation, and a permanent attention is demanded. The knowledge of physicians according to the physical examination and clinical measurements is the main point to succeed a diagnosis and monitor patient status. In this paper, the Fuzzy Cognitive Mapping approach is investigating to handle with the problem of pulmonary infections during the patient admission into the hospital or in Intensive Care Unit (ICU). This is the first step in the development of a decision support system for the process of infectious diseases prediction.

Elpiniki I. Papageorgiou, Nikolaos Papandrianos, Georgia Karagianni, G. Kyriazopoulos, D. Sfyras

A Knowledge-Based Framework for Information Extraction from Clinical Practice Guidelines

Clinical Practice Guidelines guide decision making in decision problems such as the diagnosis, prevention, etc. for specific clinical circumstances. They are usually available in the form of textual documents written in natural language whose interpretation, however, can make difficult their implementation. Additionally, the high number of available documents and the presence of information for different decision problems in the same document can further hinder their use. In this paper, we propose a framework to extract practices and indications considered to be important in a particular clinical circumstance for a specific decision problem from textual clinical guidelines. The framework operates in two consecutive phases: the first one aims at extracting pieces of information relevant for each decision problem from the documents, while the second one exploits pieces of information in order to generate a structured representation of the clinical practice guidelines for each decision problem. The application to the context of Metabolic Syndrome proves the effectiveness of the proposed framework.

Corrado Loglisci, Michelangelo Ceci, Donato Malerba

RaJoLink: A Method for Finding Seeds of Future Discoveries in Nowadays Literature

In this article we present a study which demonstrates the ability of the method RaJoLink to uncover candidate hypotheses for future discoveries from rare terms in existing literature. The method is inspired by Swanson’s ABC model approach to finding hidden relations from a set of articles in a given domain. The main novelty is in a semi-automated way of suggesting which relations might have more potential for new discoveries and are therefore good candidates for further investigations. In our previous articles we reported on a successful application of the method RaJoLink in the autism domain. To support the evaluation of the method with a well-known example from the literature, we applied it to the migraine domain, aiming at reproducing Swanson’s finding of magnesium deficiency as a possible cause of migraine. Only literature which was available at the time of the Swanson’s experiment was used in our test. As described in this study, in addition to actually uncovering magnesium as a candidate for formulating the hypothesis, RaJoLink pointed also to

interferon, interleukin

and

tumor necrosis factor

as candidates for potential discoveries connecting them with migraine. These connections were not published in the titles contemporary to the ones used in the experiment, but have been recently reported in several scientific articles. This confirms the ability of the RaJoLink method to uncover seeds of future discoveries in existing literature by using rare terms as a beacon.

Tanja Urbančič, Ingrid Petrič, Bojan Cestnik

Logical and Theoretical Aspects of Intelligent Systems

Automatic Generation of P2P Mappings between Sources Schemas

This paper deals with the problem of automatic generation of mappings between data sources schemas, by exploiting existing centralized mappings between sources schemas and a global ontology. We formalize this problem in the settings of description logics and we show that it can be reduced to a problem of rewriting queries using views. We identify two subproblems: the first one is equivalent to a well known problem of computing maximally contained rewritings while the second problem constitutes a new instance of the query rewriting problem in which the goal is to compute minimal rewritings that contain a given query. We distinguish two cases to solve this latter problem:

(i)

for languages closed under negation, the problem is reduced to the classic problem of rewriting queries using views, and

(ii)

for languages with the property of structural subsumption, a technique based on hypergraphs is proposed to solve it.

Karima Toumani, Hélene Jaudoin, Michel Schneider

An OWL Ontology for Fuzzy OWL 2

The need to deal with vague information in Semantic Web languages is rising in importance and, thus, calls for a standard way to represent such information. We may address this issue by either extending current Semantic Web languages to cope with vagueness, or by providing an ontology describing how to represent such information within Semantic Web languages. In this work, we follow the latter approach and propose and discuss an OWL ontology to represent important features of fuzzy OWL 2 statements.

Fernando Bobillo, Umberto Straccia

Fuzzy Clustering for Categorical Spaces

An Application to Semantic Knowledge Bases

A multi-relational clustering method is presented which can be applied to complex knowledge bases storing resources expressed in the standard Semantic Web languages. It adopts effective and language-independent dissimilarity measures that are based on a finite number of dimensions corresponding to a committee of discriminating features(represented by concept descriptions). The clustering algorithm expresses the possible clusterings in tuples of central elements (medoids, w.r.t. the given metric) of variable length. It iteratively adjusts these centers following the rationale of fuzzy clustering approach, i.e. one where the membership to each cluster is not deterministic but rather ranges in the unit interval. An experimentation with some ontologies proves the feasibility of our method and its effectiveness in terms of clustering validity indices.

Nicola Fanizzi, Claudia d’Amato, Floriana Esposito

Reasoning about Relations with Dependent Types: Application to Context-Aware Applications

Generally, ontological relations are modeled using fragments of first order logic (FOL) and difficulties arise when meta-reasoning is done over ontological properties, leading to reason outside the logic. Moreover, when such systems are used to reason about knowledge and meta-knowledge, classical languages are not able to cope with different levels of abstraction in a clear and simple way. In order to address these problems, we suggest a formal framework using a dependent (higher order) type theory. It maximizes the expressiveness while preserving decidability of type checking and results in a coherent theory. Two examples of meta-reasoning with transitivity and distributivity and a case study illustrate this approach.

Richard Dapoigny, Patrick Barlatier

Quasi-Classical Model Semantics for Logic Programs – A Paraconsistent Approach

We present a new paraconsistent approach to logic programming, called Quasi-classical (QC for short) model semantics. The basic idea is the following. We define the QC base as a set of all atoms and their complements, which decouples the link between an atom and its complement at the level of interpretation. Then we define QC models for positive logic programs. The QC model semantics actually effecting on disjunctive programs imposes the link between each disjunction occurring in the head of a rule and its complement disjunct. This enhances the ability of paraconsistent reasoning. We also define weak satisfaction to perform reasoning under our approach. The fixpoint semantics with respect to the QC model semantics is also presented in the paper.

Zhihu Zhang, Zuoquan Lin, Shuang Ren

Prime Implicates and Reduced Implicate Tries

The reduced implicate trie (

ri

-trie) is a data structure that was introduced as a target language for knowledge compilation. It has the property that, even when large, it guarantees fast response to queries. It also has the property that each prime implicate of the formula it represents corresponds to a branch. In this paper, those prime branches are characterized, and a technique is developed for marking nodes to identify branches that correspond to non-prime implicates. This marking technique is enhanced to allow discovery of prime implicate subsets of queries that are answered affirmatively.

Neil V. Murray, Erik Rosenthal

Logic for Reasoning about Components of Persuasive Actions

The aim of the paper is to propose an extension for a model of logic

$\mathcal AG_n$

. Thus far,

$\mathcal AG_n$

was applied for reasoning about persuasiveness of actions in multi-agent systems, i.e., we examined which arguments, provided by agents, are successful and how big such a success is. Now we enrich our approach in order to study why these arguments are efficient and what attributes cause their success. Therefore, we propose to specify persuasive actions with three parameters: content, goal and means of sending messages. As a result, we can formally express what an agent wants to achieve by executing an action, whether this action can be successful, and if not, recognize the reasons which can cause the failure.

Katarzyna Budzynska, Magdalena Kacprzak, Pawel Rembelski

A Hybrid Method of Indexing Multiple-Inheritance Hierarchies

The problem of efficient processing of the basic operations on ontologies, such as subsumption checking, or finding all subtypes of a given type, becomes of a very high importance. In the paper we present a hybrid approach of organizing multi-hierarchical structures, combining numbering schemes [1], [13] with “gene-based” methods [10], [17]. The proposed method generalizes earlier solutions and inherits advantages of earlier approaches. The obtained structure preserves the feature of incremental changes of the ontology structure. The experiments performed show significant efficiency results in accessing ontology resources for performing processes specific for semantic web.

Jacek Lewandowski, Henryk Rybinski

Text Mining

Theme Extraction from Chinese Web Documents Based on Page Segmentation and Entropy

Web pages often contain “clutters” (defined by us as unnecessary images, navigational menus and extraneous Ad links) around the body of an article that may distract users from the actual content. Therefore, how to extract useful and relevant themes from these web pages becomes a research focus. This paper proposes a new method for web theme extraction. The method firstly uses page segmentation technique to divide a web page into many unrelated blocks, and then calculates entropy of each block and that of the entire web page, then prunes redundant blocks whose entropies are larger than the threshold of the web page, lastly exports the rest blocks as theme of the web page. Moreover, it is verified by experiments that the new method takes better effect on theme extraction from Chinese web pages.

Deqing Wang, Hui Zhang, Gang Zhou

Topic-Based Hard Clustering of Documents Using Generative Models

In this paper, we describe a framework for clustering documents according to their mixtures of topics. The proposed framework combines the expressiveness of generative models for document representation with a properly chosen information-theoretic distance measure to group the documents via an agglomerative hierarchical clustering scheme. The clustering solution obtained at each level of the dendrogram reflects an organization of the documents into sets of topics, while being produced without the effort needed for a soft/fuzzy clustering method. Experimental results obtained on large, real-world collections of documents evidence the effectiveness of our approach in detecting non-overlapping clusters that contain documents sharing similar mixtures of topics.

Giovanni Ponti, Andrea Tagarelli

Boosting a Semantic Search Engine by Named Entities

Traditional Information Retrieval (IR) systems are based on bag-of-words representation. This approach retrieves relevant documents by lexical matching between query and document terms. Due to synonymy and polysemy, lexical methods produce imprecise or incomplete results. In this paper we present SENSE (SEmantic N-levels Search Engine), an IR system that tries to overcome the limitations of the ranked keyword approach, by introducing

semantic levels

which integrate (and not simply replace) the lexical level represented by keywords. Semantic levels provide information about word meanings, as described in a reference dictionary, and named entities. This paper focuses on the named entity level. Our aim is to prove that named entities are useful to improve retrieval performance. We exploit a model able to capture entity relationships, although they are not explicit in documents text. Experiments on CLEF dataset prove the effectiveness of our hypothesis.

Annalina Caputo, Pierpaolo Basile, Giovanni Semeraro

Detecting Temporal Trends of Technical Phrases by Using Importance Indices and Linear Regression

In this paper, we propose a method for detecting temporal trends of technical terms based on importance indices and linear regression methods. In text mining, importance indices of terms such as simple frequency, document frequency including the terms, and tf-idf of the terms, play a key role for finding valuable patterns in documents. As for the documents, they are often published daily, monthly, annually, and irregularly for each purpose. Although the purposes of each set of documents are not changed, roles of terms and the relationship among them in the documents change temporally. In order to detect such temporal changes, we combined a method to extract terms, importance indices of terms, and trend identification based on linear regression analysis. Empirical results show that our method detected emergent and subsiding trends of extracted terms in a corpus of a research domain. By comparing this method with the existing burst detection method, we investigated the trend of phrases consisting of several burst words in the titles of AAAI and IJCAI.

Hidenao Abe, Shusaku Tsumoto

Applications of Intelligent Systems in Music

Detecting Emotions in Classical Music from MIDI Files

At a time when the quantity of sounds surrounding us is rapidly increasing and the access to different recordings as well as the amount of music files available on the Internet is constantly growing, the problem of building music recommendation systems including systems which can automatically detect emotions contained in music files is of great importance. In this article, a new strategy for emotion detection in classical music pieces which are in MIDI format is presented. A hierarchical model of emotions consisting of two levels, L1 and L2, is used. A collection of harmonic and rhythmic attributes extracted from music files allowed for emotion detection with an average of 83% accuracy at level L1.

Jacek Grekow, Zbigniew W. Raś

Mining Musical Patterns: Identification of Transposed Motives

Automatic extraction of frequent repeated patterns in music material is an interesting problem. This paper presents an effective approach of unsupervised frequent pattern discovery method from symbolic music sources. Patterns are discovered even if they are transposed. Experiments on some songs suggest that our approach is promising, specially when dealing with songs that include non-exact repetitions.

Fernando Berzal, Waldo Fajardo, Aída Jiménez, Miguel Molina-Solana

Musical Instruments in Random Forest

This paper describes automatic classification of predominant musical instrument in sound mixes, using random forests as classifiers. The description of sound parameterization applied and methodology of random forest classification are given in the paper. Additionally, the significance of sound parameters used as conditional attributes is investigated. The results show that almost all sound attributes are informative, and random forest technique yields much higher classification results than support vector machines, used in previous research on these data.

Miron Kursa, Witold Rudnicki, Alicja Wieczorkowska, Elżbieta Kubera, Agnieszka Kubik-Komar

Application of Analysis of Variance to Assessment of Influence of Sound Feature Groups on Discrimination between Musical Instruments

In this paper, the influence of the selected sound features on distinguishing between musical instruments is presented. The features were chosen basing on our previous research. Coherent groups of features were created on the basis of significant features, adding complementary ones according to the parameterization method applied, to constitute small, homogenous groups. Next, we investigate (for each feature group separately) if there exist significant differences between means of these features for the studied instruments. We apply multivariate analysis of variance along with post hoc analysis in the form of homogeneous groups, defined by mean values of the investigated features for our instruments. If a statistically significant difference is found, then the homogenous group is established. Such a group may consist of one instrument (distinguished by this feature), or more (instruments similar wrt. this feature). The results show which instruments can be best discerned by which features.

Alicja Wieczorkowska, Agnieszka Kubik-Komar

Information Processing

Alternative Formulas for Rating Prediction Using Collaborative Filtering

This paper proposes and evaluates several alternate design choices for common prediction metrics employed by neighborhood-based collaborative filtering approach. It first explores the role of different baseline user averages as the foundation of similarity weighting and rating normalization in prediction, evaluating the results in comparison to traditional neighborhood-based metrics using the MovieLens data set. The approach is further evaluated on the Netflix movie data set, using a baseline correlation formula between movies, without meta-knowledge. For the Netflix domain, the approach is augmented with a significance weighting variant that results in an improvement over the original metric. The resulting approach is shown to improve accuracy for neighborhood-based collaborative filtering, and it is general and applicable to establishing relationships among agents with a common list of items which establish their preferences.

Amar Saric, Mirsad Hadzikadic, David Wilson

On Three Classes of Division Queries Involving Ordinal Preferences

In this paper, we are interested in taking preferences into account for a family of queries inspired by the relational division. A division query aims at retrieving the elements associated with a specified set of values and usually the results remain not discriminated. So, we suggest the introduction of preferences inside such queries with the following specificities: i) the user gives his/her preferences in an ordinal way and ii) the preferences apply to the divisor which is defined as a hierarchy of sets. Different uses of the hierarchy are investigated, which leads to queries conveying different semantics and the property of the result in terms of a quotient is studied. A special attention is paid to the implementation of such queries using a regular database management system and some experimental results illustrate the feasibility of the approach.

Patrick Bosc, Olivier Pivert, Olivier Soufflet

Analyses of Knowledge Creation Processes Based on Different Types of Monitored Data

This paper presents specialized methods for analyzing knowledge creation processes and knowledge practices that are (at least partially) projected in work within a virtual collaborative environment. Support for such analysis and evaluation of knowledge creation processes is provided by historical data stored in a virtual working environment, and describes various aspects of the monitored processes (e.g. semantic information, content, log of activities, etc.). The proposed analytical methods cover different types of analysis, such as (a) statistic analysis, that provides information about processes, and the possibility to visualize such information based on user-selected presentation modes; (b) time-line based analysis that supports visualization of the real process execution with all relevant information, including the possibility to identify and further analyze working patterns in knowledge creation processes (projection of knowledge practices). Experimental evaluation of the proposed methods is carried out within the IST EU project called KP-Lab.

Ján Paralič, František Babič, Jozef Wagner, Ekaterina Simonenko, Nicolas Spyratos, Tsuyoshi Sugibuchi

Intelligent Information Processing in Semantically Enriched Web

Acquiring information from the Web is a demanding task and currently subject of a world-wide research. In this paper we focus on research of methods, and experience with development of software tools designed for retrieval, organization, presentation of information in heterogeneous data source spaces such as the Web. We see the Web as a unique evolving and unbounded information system. The presented concepts can be used also in other specific contexts of information systems in organizations that increasingly become worldwide and weaved together considering information processing.

Pavol Návrat, Mária Bieliková, Daniela Chudá, Viera Rozinajová

Agents

Modeling Ant Activity by Means of Structured HMMs

Modeling societies of individuals is a challenging task increasingly attracting the interest of the machine learning community. Here we present an application of graphical model methods in order to model the behavior of an ant colony. Ants are tagged with RFID so that their paths through the environment can be constantly recorded. A Structured Hidden Markov Model has been used to build the model of single individual activities. Then, the global profile of the colony has been traced during the migration from one nest to another. The method provided significant information concerning the social dynamics of ant colonies.

Guenael Cabanes, Dominique Fresnau, Ugo Galassi, Attilio Giordana

Modern Approach for Building of Multi-Agent Systems

Different approaches for distributed programming in modern hardware architectures allows the developers to build the efficient solutions of complicated technical and information problems. The technologies such as Web Services allow the applications to create a cross-platform for data exchange. The multi-agent systems, where a communication between the agents is essential for proper work of such applications can be developed using the technology of Service Oriented Architecture (SOA). The presented article presents how to apply the modern programming technologies, design patterns and software architectures to building standards of multi-agent systems.

Lukasz Chomatek, Aneta Poniszewska-Marańda

Relational Sequence Clustering for Aggregating Similar Agents

Many clustering methods are based on flat descriptions, while data regarding real-world domains include heterogeneous objects related to each other in multiple ways. For instance, in the field of Multi-Agent System, multiple agents interact with the environment and with other agents. In this case, in order to act effectively an agent should be able to recognise the behaviours adopted by other agents. Actions taken by an agent are sequential, and thus its behaviour can be expressed as a sequence of actions. Inferring knowledge about competing and/or companion agents by observing their actions is very beneficial to construct a behavioural model of the agent population. In this paper we propose a clustering method for relational sequences able to aggregate companion agent behaviours. The algorithm has been tested on a real world dataset proving its validity.

Grazia Bombini, Nicola Di Mauro, Stefano Ferilli, Floriana Esposito

FutureTrust Algorithm in Specific Factors on Mobile Agents

In this paper we made a comparative analysis of the well-known reputation systems with our proposal in the light of external factors. We present a new meta-heuristic formula to use in reputation systems and show how different well-known global and local reputation metrics are optimized by it. The presented experiments are to continue our research into the behavior of mobile agents in an open environment.

Michał Wolski, Mieczysław Kłopotek

Machine Learning

Ensembles of Abstaining Classifiers Based on Rule Sets

The role of abstaining from prediction by component classifiers in rule ensembles is discussed. We consider bagging and Ivotes approaches to construct such ensembles. In our proposal, component classifiers are based on unordered sets of rules with a classification strategy that solves ambiguous matching of the object’s description to the rules. We propose to induce rule sets by a sequential covering algorithm and to apply classification strategies using either rule support or discrimination measures. We adopt the classification strategies to abstaining by not using partial matching. Another contribution of this paper is an experimental evaluation of the effect of the abstaining on performance of ensembles. Results of comprehensive comparative experiments show that abstaining rule sets classifiers improve the accuracy, however this effect is more visible for bagging than for Ivotes.

Jerzy Błaszczyński, Jerzy Stefanowski, Magdalena Zając

Elicitation of Sugeno Integrals: A Version Space Learning Perspective

Sugeno integrals can be viewed as multiple criteria aggregation functions which take into account a form of synergy between criteria. As such, Sugeno integrals constitute an important family of tools for modeling qualitative preferences defined on ordinal scales. The elicitation of Sugeno integrals starts from a set of data that associates a global evaluation assessment to situations described by multiple criteria values. A consistent set of data corresponds to a non-empty family of Sugeno integrals with which the data are compatible. This elicitation process presents some similarity with the revision process underlying the version space approach in concept learning, when new data are introduced. More precisely, the elicitation corresponds to a graded extension of version space learning, recently proposed in the framework of bipolar possibility theory. This paper establishes the relation between these two formal settings.

Henri Prade, Agnes Rico, Mathieu Serrurier

Efficient MAP Inference for Statistical Relational Models through Hybrid Metaheuristics

Statistical Relational Models are state-of-the-art representation formalisms at the intersection of logical and statistical machine learning. One of the most promising models is Markov Logic (ML) which combines Markov networks (MNs) and first-order logic by attaching weights to first-order formulas and using these as templates for features of MNs. MAP inference in ML is the task of finding the most likely state of a set of output variables given the state of the input variables and this problem is NP-hard. In this paper we present an algorithm for this inference task based on the Iterated Local Search (ILS) and Robust Tabu Search (RoTS) metaheuristics. The algorithm performs a biased sampling of the set of local optima by using RoTS as a local search procedure and repetitively jumping in the search space through a perturbation operator, focusing the search not on the full space of solutions but on a smaller subspace defined by the solutions that are locally optimal for the optimization engine. We show through extensive experiments in real-world domains that it improves over the state-of-the-art algorithm in terms of solution quality and inference time.

Marenglen Biba, Stefano Ferilli, Floriana Esposito

Combining Time and Space Similarity for Small Size Learning under Concept Drift

We present concept drift responsive method for classifier training for sequential data. Relevant instance selection for training is based on similarity to the target observation. Similarity in space and in time is combined. The algorithm determines an optimal training set size. It can be used plugging in different base classifiers. The proposed algorithm shows the best accuracy in the peer group. The algorithm complexity is reasonable for the field applications.

Indrė Žliobaitė

Similarity and Kernel Matrix Evaluation Based on Spatial Autocorrelation Analysis

We extend the framework of spatial autocorrelation analysis on Reproducing Kernel Hilbert Space (RKHS). Our results are based on the fact that some geometrical neighborhood structures vary when samples are mapped into a RKHS, while other neighborhood structures do not. These results allow us to design a new measure for measuring the goodness of a kernel and more generally a similarity matrix. Experiments on UCI datasets show the relevance of our methodology.

Vincent Pisetta, Djamel A. Zighed

Applications of Intelligent Systems

Job Offer Management: How Improve the Ranking of Candidates

The market of online job search sites grows exponentially. This implies volumes of information (mostly in the form of free text) become manually impossible to process. An analysis and assisted categorization seems relevant to address this issue. We present E-Gen, a system which aims to perform assisted analysis and categorization of job offers and of the responses of candidates. This paper presents several strategies based on vectorial and probabilistic models to solve the problem of profiling applications according to a specific job offer. Our objective is a system capable of reproducing the judgement of the recruitment consultant. We have evaluated a range of measures of similarity to rank candidatures by using ROC curves. Relevance feedback approach allows to surpass our previous results on this task, difficult, diverse and higly subjective.

Rémy Kessler, Nicolas Béchet, Juan-Manuel Torres-Moreno, Mathieu Roche, Marc El-Bèze

Discovering Structured Event Logs from Unstructured Audit Trails for Workflow Mining

Workflow mining aims to find graph-based process models based on activities, emails, and various event logs recorded in computer systems. Current workflow mining techniques mainly deal with well-structured and -symbolized event logs. In most real applications where workflow management software tools are not installed, these structured and symbolized logs are not available. Instead, the artifacts of daily computer operations may be readily available. In this paper, we propose a method to map these artifacts and content-based logs to structured logs so as to bridge the gap between the unstructured logs of real life situations and the status quo of workflow mining techniques. Our method consists of two tasks: discovering workflow instances and activity types. We use a clustering method to tackle the first task and a classification method to tackle the second. We propose a method to combine these two tasks to improve the performance of two as a whole. Experimental results on simulated data show the effectiveness of our method.

Liqiang Geng, Scott Buffett, Bruce Hamilton, Xin Wang, Larry Korba, Hongyu Liu, Yunli Wang

GIS-FLSolution: A Spatial Analysis Platform for Static and Transportation Facility Location Allocation Problem

Static and transportation facility location allocation problem is a new problem in facility location research. It aims to find out optimal locations of static and transportation facilities to serve an objective area with minimum costs. The problem is challenging because two types of facilities are involved and locations of transportation facilities are dependent on locations of static facilities and demand objects. This paper proposes a new stand-alone GIS platform, GIS-FLSolution, to solve the problem. Combined with a customized algorithm called STFLS, the platform is built on MapObjects and can successfully provide results with a friendly graphical user interface. Preliminary experiments have been conducted to demonstrate the efficiency and practicality of the platform.

Wei Gu, Xin Wang, Liqiang Geng

A CBR System for Knowing the Relationship between Flexibility and Operations Strategy

Changing environments are driving firms towards the development of new techniques for the decision making process in order to fit rapidly with alterations and adjustments of the market. In this context, the relationship between operations strategy and flexibility plays a fundamental role for increasing performance goals. For this reason, this paper presents a Fuzzy Probabilistic Case-based reasoning (FP-CBR) system which studies the relationship between flexibility and operations strategy in a real sample of engineering consulting firms in Spain. The objective is to develop a framework of analysis based on CBR and fuzzy logic whose accuracy is measured in order to assess scientific evidence to the conclusions. In order to help manager to make decisions about the firms.

Daniel Arias-Aranda, Juan L. Castro, Maria Navarro, José M. Zurita

Semantic-Based Top-k Retrieval for Competence Management

We present a knowledge-based system, for skills and talent management, exploiting semantic technologies combined with top-k retrieval techniques. The system provides advanced distinguishing features, including the possibility to formulate queries by expressing both strict requirements and preferences in the requested profile and a semantic-based ranking of retrieved candidates. Based on the knowledge formalized within a domain ontology, the system implements an approach exploiting top-k based reasoning services to evaluate semantic similarity between the requested profile and retrieved ones. System performance is discussed through the presentation of experimental results.

Umberto Straccia, Eufemia Tinelli, Simona Colucci, Tommaso Di Noia, Eugenio Di Sciascio

A New Strategy Based on GRASP to Solve a Macro Mine Planning

In this paper we introduce a greedy randomized adaptive search procedure(GRASP) algorithm for solving a copper mine planning problem. In the last 10 years this real-world problem has been tackled using linear integer programming and constraint programming. Our mine planning problem is a large scale problem, thus in order to find an optimal solution using complete methods, the model was simplified by relaxing many constraints. We now present a Grasp algorithm which works with the complete model and it is able to find better feasible near-optimal solutions, than the complete approach that has been used until now.

María-Cristina Riff, Eridan Otto, Xavier Bonnaire

Food Wholesales Prediction: What Is Your Baseline?

Sales prediction is an important problem for different companies involved in manufacturing, logistics, marketing, wholesaling and retailing. Different approaches have been suggested for food sales forecasting. Several researchers, including the authors of this paper, reported on the advantage of one type of technique over the others for a particular set of products. In this paper we demonstrate that besides an already recognized challenge of building accurate predictive models, the evaluation procedures themselves should be considered more carefully. We give illustrative examples to show that e.g. popular

MAE

and

MSE

estimates can be intuitive with one type of product and rather misleading with the others. Furthermore, averaging errors across differently behaving products can be also counter intuitive. We introduce new ways to evaluate the performance of wholesales prediction and discuss their biases with respect to different error types.

Jorn Bakker, Mykola Pechenizkiy

Complex Data

A Distributed Immunization Strategy Based on Autonomy-Oriented Computing

In recent years, immunization strategies have been developed for stopping epidemics in complex-network-like environments. So far, it remains to be difficult for the existing strategies to deal with distributed community networks, even though they are ubiquitous in the real world. In this paper, we propose a distributed immunization strategy based on the ideas of self-organization and positive feedback from Autonomy-Oriented Computing (AOC). The AOC-based strategy can effectively be applied to handle large-scale, dynamic networks. Our experimental results have shown that the autonomous entities deployed in this strategy can collectively find and immunize most of the highly-connected nodes in a network within just a few steps.

Jiming Liu, Chao Gao, Ning Zhong

Discovering Relevant Cross-Graph Cliques in Dynamic Networks

Several algorithms, namely

CubeMiner

,

Trias

, and

Data-Peeler

, have been recently proposed to mine closed patterns in ternary relations. We consider here the specific context where a ternary relation denotes the value of a graph adjacency matrix at different timestamps. Then, we discuss the constraint-based extraction of patterns in such dynamic graphs. We formalize the concept of

δ

-contiguous closed 3-clique and we discuss the availability of a complete algorithm for mining them. It is based on a specialization of the enumeration strategy implemented in

Data-Peeler

. Indeed, clique relevancy can be specified by means of a conjunction of constraints which can be efficiently exploited. The added-value of our strategy is assessed on a real dataset about a public bicycle renting system. The raw data encode the relationships between the renting stations during one year. The extracted

δ

-contiguous closed 3-cliques are shown to be consistent with our domain knowledge on the considered city.

Loïc Cerf, Tran Bao Nhan Nguyen, Jean-François Boulicaut

Statistical Characterization of a Computer Grid

Large-scale statistical analysis of more than 28 million jobs collected during 20 months of grid activity was undertaken in order to examine the relations between users, computing elements and jobs in the network. The results give insight into the global system behaviour and can be used to build models applicable in various contexts of grid computing. As an example, we here construct probabilistic models that prove to be able to accurately predict job abortion.

Lovro Ilijašić, Lorenza Saitta

On Social Networks Reduction

Since the availability of social networks data and the range of these data have significantly grown in recent years, new aspects have to be considered. In this paper, we use combination of Formal Concept Analysis and well-known matrix factorization methods to address computational complexity of social networks analysis and clarity of their visualization. The goal is to reduce the dimension of social network data and to measure the amount of information, which has been lost during the reduction. Presented example containing real data proves the feasibility of our approach.

Václav Snášel, Zdeněk Horák, Jana Kočíbová, Ajith Abraham

Networks Consolidation through Soft Computing

This paper reports the application of soft computing in redesign operations on the customers’ clusters (sub-networks), since many customers maybe initially located in a cluster with less intra-cluster traffic. Here we assume an existing network with reconfigurable architecture and we propose a number of redesign techniques to reduce the extra-traffic and maximize the intra-traffic within the clusters by considering customers’ movement and clusters consolidation. Furthermore, the proposed search approach is based on Genetic Algorithm (GA) with an object-oriented chromosome representation. Our experimental results for a network size of 50 customers show an average of 22% reduction in the extra-traffic through the proposed redesign operations.

Sami Habib, Paulvanna Nayaki Marimuthu, Mohammad Taha

Lacking Labels in the Stream: Classifying Evolving Stream Data with Few Labels

This paper outlines a data stream classification technique that addresses the problem of insufficient and biased labeled data. It is practical to assume that only a small fraction of instances in the stream are labeled. A more practical assumption would be that the labeled data may not be independently distributed among all training documents. How can we ensure that a good classification model would be built in these scenarios, considering that the data stream also has evolving nature? In our previous work we applied semi-supervised clustering to build classification models using limited amount of labeled training data. However, it assumed that the data to be labeled should be chosen randomly. In our current work, we relax this assumption, and propose a label propagation framework for data streams that can build good classification models even if the data are not labeled randomly. Comparison with state-of-the-art stream classification techniques on synthetic and benchmark real data proves the effectiveness of our approach.

Clay Woolam, Mohammad M. Masud, Latifur Khan

Novelty Detection from Evolving Complex Data Streams with Time Windows

Novelty detection in data stream mining denotes the identification of new or unknown situations in a stream of data elements flowing continuously in at rapid rate. This work is a first attempt of investigating the anomaly detection task in the (multi-)relational data mining. By defining a data block as the collection of complex data which periodically flow in the stream, a relational pattern base is incrementally maintained each time a new data block flows in. For each pattern, the time consecutive support values collected over the data blocks of a time window are clustered, clusters are then used to identify the novelty patterns which describe a change in the evolving pattern base. An application to the problem of detecting novelties in an Internet packet stream is discussed.

Michelangelo Ceci, Annalisa Appice, Corrado Loglisci, Costantina Caruso, Fabio Fumarola, Donato Malerba

General AI

On Computational Creativity, ‘Inventing’ Theorem Proofs

We provide a precise illustration of what can be the idea of “computational creativity”, that is, the whole set of the methods by which a computer may simulate creativity. This paper is centered on the relationship between computational creativity and theorem proving. The basic tool for this kind of computational creativity is what we call an ‘asset generator’ a specification of which is given in section 5, followed by a short description of our methodology for the generation of assets in theorem proving. In a sense, our ‘asset generation methodology’ relies essentially on making explicit the logician’s good sense while performing a recursion constructive proof. Our contribution is making explicit this good sense and making a systematic methodology of it.

Marta Fraňová, Yves Kodratoff

Revisiting Constraint Models for Planning Problems

Planning problems deal with finding a sequence of actions that transfer the initial state of the world into a desired state. Frequently such problems are solved by dedicated algorithms but there exist planners based on translating the planning problem into a different formalism such as constraint satisfaction or Boolean satisfiability and using a general solver for this formalism. The paper describes how to enhance existing constraint models of planning problems by using techniques such as symmetry breaking (dominance rules), singleton consistency, nogoods, and lifting.

Roman Barták, Daniel Toropila

Uncertainty

Interval-Valued Fuzzy Formal Concept Analysis

Fuzzy formal concept analysis is concerned with formal contexts expressing scalar-valued fuzzy relationships between objects and their properties. Existing fuzzy approaches assume that the relationship between a given object and a given property is a matter of degree in a scale

L

(generally [0,1]). However, the extent to which “object

o

has property

a

” may be sometimes hard to assess precisely. Then it is convenient to use a sub-interval from the scale

L

rather than a precise value. Such formal contexts naturally lead to interval-valued formal concepts. The aim of the paper is twofold. We provide a sound minimal set of requirements for interval-valued implications in order to fulfill the fuzzy closure properties of the resulting Galois connection. Secondly, a new approach based on a generalization of Gödel implication is proposed for building the complete lattice of all interval-valued formal concepts.

Yassine Djouadi, Henri Prade

Application of Meta Sets to Character Recognition

A new approach to character recognition problem, based on meta sets, is introduced and developed. For the given compound character pattern consisting of a number of character samples accompanied by their corresponding quality degrees, and for the given testing character sample, the main theorem of the paper gives means to evaluate the correlation between the testing sample and the compound pattern. It also enables calculation of similarity degrees of the testing sample to each pattern element. The quality degrees and the correlation are expressed by means of membership degrees of meta sets representing samples in the meta set representing the compound pattern. The similarity degrees are expressed as equality degrees of these meta sets.

The meta set theory is a new alternative to the fuzzy set theory. By the construction of its fundamental notions it is directed to efficient computer implementations. This paper presents an example of application of the theory to a real-life problem.

Bartłomiej Starosta

A General Framework for Revising Belief Bases Using Qualitative Jeffrey’s Rule

Intelligent agents require methods to revise their epistemic state as they acquire new information. Jeffrey’s rule, which extends conditioning to uncertain inputs, is currently used for revising probabilistic epistemic states when new information is uncertain. This paper analyses the expressive power of two possibilistic counterparts of Jeffrey’s rule for modeling belief revision in intelligent agents. We show that this rule can be used to recover most of the existing approaches proposed in knowledge base revision, such as adjustment, natural belief revision, drastic belief revision, revision of an epistemic by another epistemic state. In addition, we also show that that some recent forms of revision, namely improvement operators, can also be recovered in our framework.

Salem Benferhat, Didier Dubois, Henri Prade, Mary-Anne Williams

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter