Skip to main content

2006 | Buch

Intelligent Information Processing and Web Mining

Proceedings of the International IIS: IIPWM’06 Conference held in Ustrón, Poland, June 19-22, 2006

herausgegeben von: Mieczysław A. Kłopotek, Sławomir T. Wierzchoń, Krzysztof Trojanowski

Verlag: Springer Berlin Heidelberg

Buchreihe : Advances in Intelligent and Soft Computing

insite
SUCHEN

Über dieses Buch

This volume contains selected papers, presented at the international c- ference on Intelligent Information Processing and Web Mining Conference nd IIS:IIPWM’06, organized in Ustro? (Poland) on June 19-22 , 2006. The event was organized by the Institute of Computer Science of Polish Academy of Sciences, a leading Polish research institution in fundamental and applied research in the area of Arti?cial Intelligence (AI) and of Inf- mation Systems (IS), in cooperation with a number of scienti?c and business institutions. It was a continuation of a series of conferences on these subjects, initiated by Prof. M. D?browski and Dr. M. Michalewicz in 1992. The conference was addressed primarily to those who are active in - ti?cial Immune Systems (AIS) and other biologically motivated methods, Computational Linguistics (CL), Web technologies (WT), and Knowledge Discovery (KD), and all kinds of interactions between those ?elds. Thesubmitted paperscoverednewcomputing paradigms,amongothersin biologically motivated methods, advanced data analysis, new machine lea- ing paradigms, natural language processing, new optimization technologies, applied data mining using statistical and non-standard approaches. The - pers give an overview over a wide range of applications for intelligent systems: in operating systems design, in network security, for information extraction from multimedia (sound, graphics), in ?nancial market analysis, in medicine, in geo-science, etc. Though numerous papers have been submitted, only a fraction of them (about 40%) was accepted for publication in a rigorous reviewing process.

Inhaltsverzeichnis

Frontmatter

Regular Sessions: Artificial Immune Systems

Frontmatter
Comparing Energetic and Immunological Selection in Agent-Based Evolutionary Optimization

In the paper the idea of an immunological selection mechanism for the agent-based evolutionary computation is presented. General considerations are illustrated by the particular system dedicated to function optimization. Selected experimental results allow for the comparison of the performance of immune-inpired selection mechanisms and classical energetic ones.

Aleksander Byrski, Marek Kisiel-Dorohinicki
An Immunological and an Ethically-social Approach to Security Mechanisms in a Multiagent System

This article presents a discussion about security mechanisms in agent and multiagent systems. Presented discussion focuses on the design of an artificial immune system for intrusion detection in agent systems. An immunological approach to change detection seems very useful in design of security mechanisms for an agent functioning in his environment. Reasons for this expectation are the principles of a computer immune system such as distribution and autonomy. Mentioned principles of artificial immune systems are strongly connected with main principles of agent technology which are the autonomy of an agent and distribution in the case of multiagent system.

Krzysztof Cetnarowicz, Renata Cięciwa, Gabriel Rojek
Randomized Dynamic Generation of Selected Melanocytic Skin Lesion Features

In this paper, the methodology of generating images of melanocytic skin lesions is briefly outlined. The developed methodology proceeds essentially in two steps. In the first one, semantic description of skin lesions of anonymous patients is carefully analyzed to catch important features (symptoms) and to mine their logical values. Then, data gained in this step are used to control a specific simulation process, in which the simulated lesion’s image is randomly put together from

a priori

pre-defined fragments (textures). In this way, a single textual vector representing a distinct lesion, can produce a collection of several images of a given category. The quality of simulated images, verified by an independent expert was found to be quite satisfactory.

Zdzisław S. Hippe, Jerzy W. Grzymała-Busse, Ł. Piątek
Controlling Spam: Immunity-based Approach

Using electronic mail (e-mail) we can communicate freely and almost at no cost. It creates new possibilities for companies that can use e-mail to send advertisements to their clients (that is called direct-mailing). The term spam refers mostly to that kind of advertisements. Massively sent unsolicited e-mails attack many Internet users. Unfortunately, this kind of message can not be filtered out by simple rule-based filters. In this paper we will extend artificial immune system (AIS) proposed in [6] which is based on mammalian immune system and designed to protect users from spam. Generally AIS are also used to detect computer viruses or to detect anomalies in computer networks.

Konrad Kawecki, Franciszek Seredyński, Marek Pilski
A Comparison of Clonal Selection Based Algorithms for Non-Stationary Optimisation Tasks

Mammalian immune system and especially clonal selection principle, responsible for coping with external intruders, is an inspiration for a set of heuristic optimization algorithms. Below, a few of them are compared on a set of nonstationary optimization benchmarks. One of the algorithms is our proposal, called AIIA (Artificial Immune Iterated Algorithm). We compare two versions of this algorithm with two other well known algorithms. The results show that all the algorithms based on clonal selection principle can be quite efficient tools for nonstationary optimization.

Krzysztof Trojanowski, Sławomir T. Wierzchoń

Regular Sessions: Evolutionary Methods

Frontmatter
On Asymptotic Behaviour of a Simple Genetic xsAlgorithm

The simple genetic algorithm (SGA) and its convergence analysis are main subjects of the article. The SGA is defined on a finite multi-set of potential problem solutions (individuals) together with random mutation and selection operators. The selection operation acts on the basis of the fitness function defined on potential solutions (individuals), and is fundamental for the problem considered. Generation of a new population from the given one, is realized by the iterative actions of those operators. Each iteration is written in the form of a transition operator acting on probability vectors which describe probability distributions of each population. The transition operator is a Markov one. Thanks to the well-developed theory of Markov operators [5,8,9] new conditions for stability of the transition operator are formulated. The obtained results are related to the class of genetic operators and are not restricted to binary operators.

Witold Kosiński, Stefan Kotowski, Jolanta Socała
Evolutionary Algorithm of Radial Basis Function Neural Networks and Its Application in Face Recognition

This paper proposes a new evolutionary algorithm (EA) which includes five different mutation operators: nodes merging, nodes deletion, penalizing, nodes inserting and hybrid training. The algorithm adaptively determines the structure and parameters of the radial basis function neural networks (RBFN). Many different radial basis functions with different sizes (covering area, locations and orientations) were used to construct the near-optimal RBFN during training. The resulting RBFN behaves even more powerful and requires fewer nodes than other algorithms. Simulation results in face recognition show that the system achieves excellent performance both in terms of error rates of classification and learning efficiency.

Jianyu Li, Xianglin Huang, Rui Li, Shuzhong Yang, Yingjian Qi
GAVis System Supporting Visualization, Analysis and Solving Combinatorial Optimization Problems Using Evolutionary Algorithms

The paper presents the GAVis (Genetic Algorithm Visualization) system designed to support solving combinatorial optimization problems using evolutionary algorithms. One of the main features of the system is tracking complex dependencies between parameters of an implemented algorithm with use of visualization. The role of the system is shown by its application to solve two problems: multiprocessor scheduling problem and Travelling Salesman Problem (TSP).

Piotr Świtalski, Franciszek Seredyński, Przemysław Hertel

Regular Sessions: Computational Linguistics

Frontmatter
Gazetteer Compression Technique Based on Substructure Recognition

Finite-state automata are state-of-the-art representation of dictionaries in natural language processing. We present a novel compression technique that is especially useful for gazetteers – a particular sort of dictionaries. We replace common substructures in the automaton by unique copies. To find them, we treat a transition vector as a string, and we apply a Ziv-Lempel-style text compression technique that uses suffix tree to find repetitions in lineaqr time. Empirical evaluation on real-world data reveals space savings of up to 18,6%, which makes this method highly attractive.

Jan Daciuk, Jakub Piskorski
WFT – Context-Sensitive Speech Signal Representation

Progress of automatic speech recognition systems’ (ASR) development is, inter alia, made by using signal representation sensitive for more and more sophisticated features. This paper is an overview of our investigation of the new context-sensitive speech signal’s representation, based on wavelet-Fourier transform (WFT), and proposal of it’s quality measures. The paper is divided into 5 sections, introducing as follows: phonetic-acoustic contextuality in speech, basics of WFT, WFT speech signal feature space, feature space quality measures and finally conclusion of our achievements.

Jakub Gałka, Michał Kępiński

Regular Sessions: Web Technologies

Frontmatter
Adaptive Document Maps

As document map creation algorithms like WebSOM are computationally expensive, and hardly reconstructible even from the same set of documents, new methodology is urgently needed to allow to construct document maps to handle streams of new documents entering document collection. This challenge is dealt with within this paper. In a multi-stage process, incrementality of a document map is warranted.

1

The quality of map generation process has been investigated based on a number of clustering and classification measures. Conclusions concerning the impact of incremental, topic-sensitive approach on map quality are drawn.

Krzysztof Ciesielski, Michał Dramiński, Mieczysław A. Kłopotek, Dariusz Czerski, Sławomir T. Wierzchoń
Faster Frequent Pattern Mining from the Semantic Web

In this paper we propose a method for frequent pattern discovery from the knowledge bases represented in OWL DLP. OWL DLP, known also as Description Logic Programs, is the intersection of the expressivity of OWL DL and Logic Programming. Our method is based on a special form of a trie data structure. A similar structure was used for frequent pattern discovery in classical and relational data mining settings giving significant gain in efficiency. Our approach is illustrated on the example ontology.

Joanna Józefowska, Agnieszka Ławrynowicz, Tomasz Łukaszewski
Collective Behaviour of Cellular Automata Rules and Symmetric Key Cryptography

Cellular automata (CA) is applied in cryptographic systems. Genetic algorithm (GA) is used to search among predefined set of rules new subsets of rules controlling CA. A high quality pseudorandom numbers sequences (PNSs) are generated by CA applying new subsets of rules. Discovered subset create very efficient cryptographic module used as pseudorandom numbers sequences generator (PNSG). The bad subsets of rules are also discovered and eliminated.

Miroslaw Szaban, Franciszek Seredy nski, Pascal Bouvry

Regular Sessions: Foundations of Knowledge Discovery

Frontmatter
Experiments on Data with Three Interpretations of Missing Attribute Values—A Rough Set Approach

In this paper we distinguish three different types of missing attribute values: lost values (e.g., erased values), “do not care” conditions (attribute values that were irrelevant for classification a case), and attribute-concept values (“do not care” conditions restricted to a specific concept). As it is known, subset and concept approximations should be used for knowledge acquisition from incomplete data sets. We report results of experiments on seven well-known incomplete data sets using nine strategies: interpreting missing attribute values in three different ways and using both lower and upper, subset and concept approximations (note that subset lower approximations are identical with concept lower approximations). Additionally, in the data sets cases with more than approximately 70% of missing attribute values, these values were removed from the original data sets and then all nine strategies were applied. Our conclusions are that any two of our nine strategies are incomparable in terms of error rates (5% significance level, two-tailed test). However, for some data sets removing cases with an excessive number of missing attribute values improves the error rate.

Jerzy W. Grzymała-Busse, Steven Santoso
Tableau Method with Free Variables for Intuitionistic Logic

In this paper, we address proof search in tableaux with free variables for intuitionistic logic by introducing the notion of an admissible substitution into a quantifier-free calculus. Admissibility of a substitution is determined by the quanti fier structure of given formulae and by dependencies between variables in the substitution. With this notion of admissibility, we avoid the need for both Skolemisation and checking different possible orders of quantifier rule applications. We demonstrate our approach on a series of examples.

Boris Konev, Alexander Lyaletski
A Similarity Measure between Tandem Duplication Trees

This paper opens the gate to understanding the nature of unequal crossing-over process, which is one of the mechanisms that leads to creation of new genes. The Data Mining and Tree Mining approaches are being modified to fit that particular biological problem. The novel notions: the similarity of duplication process and the similarity of a duplication region are proposed, and settled as the fundament of further analysis. The role and applications of the duplication process similarity measure are discussed. The roadmap for further extensive studies together with first interesting results are presented.

Jakub Koperwas, Krzysztof Walczak
Finding Optimal Decision Trees

This paper presents a new algorithm that finds the generative model of a decision tree from data. We show that for infinite data and finite number of attributes the algorithm always finds the generative model (i.e. the model of the decision tree, from which the data were generated) except measure zero set of distributions. The algorithm returns reasonable results even when the above-mentioned assumptions are not satisfied. The algorithm is polynomial in the number of leaves of the generative model compared to the exponential complexity of the trivial exhaustive search algorithm. Similar result was recently obtained for learning Bayesian networks from data ([1],[2]). Experimental comparison of the new algorithm with the CART standard on both simulated and real data is shown. The new algorithm shows significant improvements over the CART algorithm in both cases. The whole paper is for simplicity restricted to binary variables but can be easily generalized.

Petr Máša, Tomáš Kočka
Attribute Number Reduction Process and Nearest Neighbor Methods in Machine Learning

Several nearest neighbor methods were applied to process of decision making to E522144 and modified bases, which are the collections of cases of melanocytic skin lesions. Modification of the bases consists in reducing the number of base attributes from 14 to 13, 4, 3, 2 and finally 1. The reduction process consists in concatenations of values of particular attributes. The influence of this process on the quality of decision making process is reported in the paper.

Aleksander Sokołowski, Anna Gładysz
The Use of Compound Attributes inAQ Learning

Compound attributes are named groups of attributes that have been introduced in Attributional Calculus (AC) to facilitate learning descriptions of objects whose components are characterized by different subsets of attributes. The need for such descriptions appears in many practical applications. A method for handling compound attributes in AQ learning and testing is described and illustrated by examples.

Janusz Wojtusiak, Ryszard S. Michalski

Regular Sessions: Statistical Methods in Knowledge Discovery

Frontmatter
Residuals for Two-Way Contingency Tables, Especially Those Computed forMultiresponces

The problem of testing statistical hypotheses of independence of two multiresponse variables is considered. This is a specific inferential environment to analyze certain patterns, particularly for the questionnaire data. Data analyst normally looks for certain combination of responses being more frequently chosen by responders than the other ones. A formalism that is adequate for the considerations of such a kind is connected with calculation of p-values within the so-called posthoc analysis. Since this analysis is often connected with one cell of an appropriate contingency table only, we consider residual (or deviate) of this cell. As a result of theoretical and experimental study we consider algorithms that can be effective for the problem. We consider the case of 2 × 3 tables. Some aspects are relevant also for classical i.e. uniresponsive contingency tables.

Guillermo Bali Ch., Dariusz Czerski, Mieczysław A. Kłopotek, Andrzej Matuszewski
Visualizing Latent Structures in Grade Correspondence Cluster Analysis and Generalized Association Plots

The latent structure of psychological data set concerning superstitions is investigated by means of two recent exploratory methods: Grade Correspondence Cluster Analysis (GCCA) and Generalized Association Plots (GAP). The paper compares visualized results in GCCA and GAP. Moreover, it shows what differs both methodologies and what is their intrinsic similarity, according to which the revealed latent structures become equivalent whenever the data set is sufficiently regular. Therefore upon the basis of the real data set, were constructed two types of highly regular simulated data, of the same size and the same multivariate dependence index. These simulated data were then analyzed.

Wieslaw Szczesny, Marek Wiech
Converting a Naive Bayes Model into a Set of Rules

A knowledge representation based on the probability theory is currently the most popular way of handling uncertainty. However, rule based systems are still popular. Their advantage is that rules are usually more easy to interpret than probabilistic models. A conversion method would allow to exploit advantages of both techniques. In this paper an algorithm that converts Naive Bayes models into rule sets is proposed. Preliminary experimental results show that rules generated from Naive Bayes models are compact and accuracy of such rule-based classifiers are relatively high.

Bartłomiej Śnieżyński

Regular Sessions: Knowledge Discovery in Applications

Frontmatter
Improving Quality of Agglomerative Scheduling in Concurrent Processing of Frequent Itemset Queries

Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. Recently, a new problem of optimizing processing of batches of frequent itemset queries has been considered. The best technique for this problem proposed so far is Common Counting, which consists in concurrent processing of frequent itemset queries and integrating their database scans. Common Counting requires that data structures of several queries are stored in main memory at the same time. Since in practice memory is limited, the crucial problem is scheduling the queries to Common Counting phases so that the I/O cost is optimized. According to our previous studies, the best algorithm for this task, applicable to large batches of queries, is CCAgglomerative. In this paper we present a novel query scheduling method CCAgglomerativeNoise, built around CCAgglomerative, increasing its chances of finding an optimal solution.

Pawel Boinski, Konrad Jozwiak, Marek Wojciechowski, Maciej Zakrzewicz
Analysis of the Structure of Online Marketplace Graph

In this paper the structure of the online marketplace graph is studied. We based our research on one of the biggest and most popular online auction services in Poland. Our graph is created from the data obtained from Transactions Rating System of this service. We discuss properties of the considered graph and its dynamics. It turns out that such a graph has scale-free topology and shows smallworld behaviour. We also discovered a few interesting features (e.g. high clustering mutual coefficient) which are not present in other real-life networks.

Andrzej Dominik, Jacek Wojciechowski
Trademark Retrieval in the Presence of Occlusion

Employing content based image retrieval (CBIR) methods to trademark registration can improve and accelerate the checking process greatly. Amongst all the features present in CBIR, shape seems to be the most appropriate for this task. It is however usually only utilized for non-occluded and noise free objects. In this paper the emphasis is put on the atypical case of the fraudulent creation of a new trademark based on a popular registered one. One can just modify an existing logo by, for example, removing or inserting a part into it. Another method is to modify even smaller subparts, which is close to adding noise to it’s silhouette. So, a method is herein described of template matching using a shape descriptor which is robust to rotation, scaling, shifting, and also to occlusion and noise.

Dariusz Frejlichowski
On Allocating Limited Sampling Resources Using a Learning Automata-based Solution to the Fractional Knapsack Problem

In this paper, we consider the problem of allocating limited sampling resources in a “real-time” manner with the purpose of estimating multiple binomial proportions. This is the scenario encountered when evaluating multiple web sites by accessing a limited number of web pages, and the proportions of interest are the fraction of each web site that is successfully validated by an HTML validator [11]. Our novel solution is based on mapping the problem onto the so-called nonlinear fractional knapsack problem with separable and concave criterion functions [3], which, in turn, is solved using a

Team

of deterministic Learning Automata (LA). To render the problem even more meaningful, since the binomial proportions are unknown and must be sampled, we particularly consider the scenario when the target criterion functions are

stochastic

with

unknown

distributions. Using the general LA paradigm, our scheme improves a current solution in an online manner, through a series of informed guesses which move towards the optimal solution. At the heart of our scheme, a team of deterministic LA performs a controlled random walk on a discretized solution space. Comprehensive experimental results demonstrate that the discretization resolution determines the precision of our scheme, and that for a given precision, the current resource allocation solution is consistently improved, until a near-optimal solution is found – even for periodically switching environments. Thus, our scheme, while being novel to the entire field of LA, also efficiently handles a class of resource allocation problems previously not addressed in the literature.

Ole-Christofier Granmo, B. John Oommen
Learning Symbolic User Models for Intrusion Detection: A Method and Initial Results

This paper briefly describes the LUS-MT method for automatically learning user signatures (models of computer users) from datastreams capturing users’ interactions with computers. The signatures are in the form of collections of multistate templates (MTs), each characterizing a pattern in the user’s behavior. By applying the models to new user activities, the system can detect an imposter or verify legitimate user activity. Advantages of the method include the high expressive power of the models (a single template can characterize a large number of different user behaviors) and the ease of their interpretation, which makes possible their editing or enhancement by an expert. Initial results are very promising and show the potential of the method for user modeling.

Ryszard S. Michalski, Kenneth A. Kaufman, Jaroslaw Pietrzykowski, Bartłomiej Śnieżyński, Janusz Wojtusiak
Multichannel Color Image Watermarking Using PCA Eigenimages

In the field of image watermarking, research has been mainly focused on gray image watermarking, whereas the extension to the color case is usually accomplished by marking the image luminance, or by processing color channels separately. In this paper we propose a new digital watermarking method of three bands RGB color images based on Principal Component Analysis (PCA). This research, which is an extension of our earlier work, consists of embedding the same digital watermark into three RGB channels of the color image based on PCA eigenimages. We evaluated the effectiveness of the method against some watermark attacks. Experimental results show that the performance of the proposed method against most prominent attacks is good.

Kazuyoshi Miyara, Thai Duy Hien, Hanane Harrak, Yasunori Nagata, Zensho Nakao
Developing a Model Agent-based Airline Ticket Auctioning System

Large body of recent work has been devoted to multi-agent systems utilized in e-commerce scenarios. In particular, autonomous software agents participating in auctions have attracted a lot of attention. Interestingly, most of these studies involve purely virtual scenarios. In an initial attempt to fill this gap we discuss a model agent-based e-commerce system modified to serve as an airline ticket auctioning system. Here, the implications of forcing agents to obey actual rules that govern ticket sales are discussed and illustrated by UML-formalized depictions of agents, their relations and functionalities.

Mladenka Vukmirovic, Maria Ganzha, Marcin Paprzycki
Multi-Label Classification of Emotions in Music

This paper addresses the problem of multi-label classification of emotions in musical recordings. The testing data set contains 875 samples (30 seconds each). The samples were manually labelled into 13 classes, without limits regarding the number of labels for each sample. The experiments and test results are presented.

Alicja Wieczorkowska, Piotr Synak, Zbigniew W. Raś
Wrapper Maintenance for Web-Data Extraction Based on Pages Features

Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interest. There are two main issues relevant to Web-data extraction, namely wrapper generation and wrapper maintenance. In this paper, we propose a novel approach to automatic wrapper maintenance. It is based on the observation that despite various page changes, many important features of the pages are preserved, such as text pattern features, annotations, and hyperlinks. Our approach uses these preserved features to identify the locations of the desired values in the changed pages, and repairs wrappers correspondingly. Experiments over several real-world Web sites show that the proposed automatic approach can effectively maintain wrappers to extract desired data with high accuracy.

Shunxian Zhou, Yaping Lin, Jingpu Wang, Xiaolin Yang

Poster Session

Frontmatter
Parsing Polish as a Context-Free Language

A set of 974 lexical symbols is defined which may appear in Polish text. Based on this set, a context-free grammar is constructed whose Chomsky normal form possesses 755 variables, 490 terminals and 1790 productions. Probabilities of these productions are estimated using over 40000 unparsed sentences. It turns out that a parsing algorithm using the resulting probabilistic context-free grammar parses correctly about 1/4 sentences.

Stanisław Galus
MAP a : a Language for Modelling Conversations in Agent Environments

In this paper we present the MAP

a

language for expressing dialogues in multiagent systems. This is accomplished by defining patterns of communication between groups of agents, expressed by protocols. Our language is directly implementable and allows to specify the connection between communication and knowledge management in a way that is independent of the specific reasoning techniques used. Here we introduce MAP

a

formal syntax and we point out added features with respect to its predecessor, the MAP language.

María Adela Grando, Christopher D. Walton
Definiteness of Polish Noun Phrases Modified by Relative Clauses in the DRT Framework

In this paper, I investigate anaphoric coreference as one of the sources of definiteness of NPs. The main focus here is how relative clauses influence the process of finding an antecedent for an NP in the standard DRT framework. Treelike indexes are adapted for this goal. The analysis is performed for article-free language (Polish). Other sources of definiteness are not considered.

Elżbieta Hajnicz
DCT Watermarking Optimization by Genetic Programming

Embedding a digital watermark in an electronic document is proving to be a feasible solution for multimedia copyright protection and authentication purposes. However, the balance between the watermark robustness and its invisibility has always been a challenge for watermarkers. Consequently, it was necessary to use a powerful computation system that can guarantee the watermarking requirements. In this end, we propose to apply genetic programming to digital watermarking. In this work, we are presenting a new watermarking scheme in DCT domain based on genetic programming (GP). It is an optimizing structure which permits to develop automatically the embedding equation of a DCT algorithm possessing a high PSNR value and a good robustness. Simulation results were satisfactory.

Hanane Harrak, Thai Duy Hien, Yasunori Nagata, Zensho Nakao
Searching Text Corpora with grep

The paper presents simple methods for perfoming pattern search on annotated text corpora. Elementary text processing techniques are applied, based on the use of common text scanning tools:

flex

and

grep

. The methods allow to properly handle ambiguous annotation, as well as structured tags. Processing times for some types of queries are comparable to those attained by elaborated search engines using indexing techniques with query languages of similar expressiveness.

Tomasz Obrębski
Network Traffic Analysis Using Immunological and Evolutionary Paradigms

The paper presents an approach to anomaly detection problem based on self-nonself space paradigm. Hyperrectangular structure as description for self and nonself elements is proposed. Niching genetic algorithm is used for generation of detector set. Results of conducted experiments show a high quality of intrusion detection which outperforms the quality of recently proposed approach based on hypersphere representation of self-space.

Marek Ostaszewski, Franciszek Seredyński, Pascal Bouvry
Event Detection in Financial Time Series by Immune-Based Approach

The paper presents a concept of immune paradigm application to monitoring of company environment. Short-time prediction of stock rates is used as a basic tool to vigil relevant events, viewed as switching between “healthy” and “ill” behavior of the monitored quotations. Two predictive formulas are applied alternatively to recognize the behavior kind. “Illness” detection rules are proposed, based on the prediction efficiency evaluated in moving windows. Parameters of the predictors are modified according to the immune paradigm.

Tomasz Pelech, Jan T. Duda
Automation of Communication and Co-operation Processes in Marine Navigation

The problem of information exchange and communication in marine navigation is presented. The authors propose a sub-ontology for automatic intership communication as a supplement to the ontology of navigational information. The application of the sub-ontology enables the improvement of information exchange and co-operation between navigators steering the ships. The automation allows to reduce the impact of human errors resulting from the failure to effiectively communicate and coordinate actions. The human factor is often to blame for marine accidents. Applications in the ship communication and co-operation system are shown.

Zbigniew Pietrzykowski, Jaroslaw Chomski, Janusz Magaj, Grzegorz Niemczyk
Predictive Analysis of the pO2 Blood Gasometry Parameter Related to the Infants Respiration Insufficiency

The article presents application of artificial immune algorithms in prediction of the

p

O

2

arterial blood gasometry parameter, which is related to the infants respiration insufficiency. Artificial immune network algorithm created for this purpose allows for time series prediction of the vectorized data sets. Training data originates from the Infant Intensive Care Unit of the Polish – American Institute of Pediatry, Collegium Medicum, Jagiellonian University in Cracow.

Wieslaw Wajs, Mariusz Swiecicki, Piotr Wais, Hubert Wojtowicz, Pawel Janik, Leszek Nowak
Application of Fuzzy Logic Theory to Geoid Height Determination

Geoid determination is nowadays an important scientific problem in the fields of Geosciences. Ellipsoidal and orthometric heights are commonly used height systems in geodesy. Ellipsoidal height, measured from satellite such as GPS and GLONASS, is reckoned from ellipsoid. On the other hand orthometric height is measured from geoid. Although orthometric height has physical meaning, ellipsoidal height has just mathematical definition. Geoid height is a transformation parameter between these heights systems and a tool for rational usage of coordinates obtained from satellite measurements. Fuzzy logic theory has been popular in many different scientific, engineering fields and many geodetic problems have been solved by using fuzzy logic recently. In this study, theory and how to calculate geoid height by Fuzzy logic using Matlab is explained and a case study in Burdur (Turkey) is performed. Calculations are interpreted, discussed and conclusion is drawn.

Mehmet Yιlmaz, Mustafa Acar, Tevfik Ayan, Ersoy Arslan

Invited Session: Knowledge Base Systems

Frontmatter
On Greedy Algorithms with Weights for Construction of Partial Covers

In the paper a modification of greedy algorithm with weights for construction of partial covers is considered. Theoretical and experimental results relating to accuracy of this algorithm are discussed.

Mikhail Ju. Moshkov, Marcin Piliszczuk, Beata Zielosko
Minimal Templates Problem

In a 1976 Dempster and Shafer have created a mathematical theory of evidence called Dempster-Shafer theory. This theory is based on belief functions and plausible reasoning, which is used to combine separate pieces of information (evidence) to calculate the probability of an event. In 1982 Pawlak has created the rough set theory as an innovative mathematical tool to describing the knowledge, including also the uncertain and inexact knowledge. In 1994 the basic functions of the evidence theory have been defined, based on the notion from the rough set theory. This dependence between these theories has allowed further research on their practical usage.

Barbara Marszał-Paszek, Piotr Paszek
The Inference Processes on Clustered Rules

In this paper the problem of long and not quite efficient inference process is considered. There is some problem with large set of data, e.g. set of rules, which causes long time of inference process. The paper present the idea of hierarchical structure of knowledge base, where on each level of hierarchy there are created some groups of similar rules. The cluster analysis method has been used to build clusters of rules. Then, the interpreter of rules want be searching set of rules step by step (one by one). It has to be founded the most similar group of rules and all inference processes working on this small (exact) set of rules.

Agnieszka Nowak, Alicja Wakulicz-Deja
Extending Decision Units Conception Using Petri Nets

The paper presented hereunder pays attention to discussion of the method of using the Petri nets as the modelling tool of the processes occurring during inference. This issue is a part of the project concerning the extension of decision units model to the possibilities of effective detection and visualisation of knowledge base verification results. The basic terms of Petri nets as well as the idea of using Petri nets as the modelling agent of rule knowledge base have been presented in this paper. The method of using Petri nets for modelling of the inference process has been also discussed in further part of this paper. Short discourse has been included in the summary to this paper on foreseen directions of Petri nets usage in verification of dynamic properties of rule knowledge bases as well as the possibilities of using Petri nets for extending the properties of decision units.

Roman Siminski
Towards Modular Representation of Knowledge Base

This paper presents a conception of fast and useful inference process in knowledge based systems. The main known weakness is long and not smart process of looking for rules during the inference process. Basic inference algorithm, which is used by the rule interpreter, tries to fit the facts to rules in knowledge base. So it takes each rule and tries to execute it. As a result we receive the set of new facts, but it often contains redundant information unexpected for user. The main goal of our works is to discover the methods of inference process controlling, which allow us to obtain only necessary decision information. The main idea of them is to create rules partitions, which can drive inference process. That is why we try to use the hierarchical clustering to agglomerate the rules.

Agnieszka Nowak, Roman Siminski, Alicja Wakulicz-Deja
Lazy Learning of Agent in Dynamic Environment

Many design problems can be faced with large amount of information and uncertainty that in consequence lead to the large number of problem states, parameters and dependencies between them. Therefore, it is often hardly possible to model the problem in symbolical form using the domain knowledge or to find acceptable solution on the basis of it. In many practical problems there is a requirement for the decision support system to opearte in a dynamically changing environment. The system has to deal with continues data flow, beeing self situated in spatio-temporal environment. In such cases, it could be considered to apply AI techniques and machine learning methods. In this paper we propose an approach that aims to respond to this challenge by the construction of a learning system based on multiagent paradigm. The focus of the paper concentrates on a singleagent level where the local lazy learning method has been analysed. The results of the experiments indicate the satisfactory efficiency of the proposed solution.

Wojciech Froelich
Artificial Neural Network Resistance to Incomplete Data

This paper presents results obtained in experiments related to artificial neural networks. Artificial neural networks have been trained with delta-bar-delta and conjugate gradient algorithms in case of removing some data from dataset and fulfilling empty places with mean. The goal of the experiment was to observe how long will neural network (trained with specific algorithm) be able to learn when dataset will be consistently less and less exact – the number of incomplete data is increased.

Magdalena Alicja Tkacz

Invited Session: Applications of Artificial Immune Systems

Frontmatter
Generalization Regions in Hamming Negative Selection

Negative selection is an immune-inspired algorithm which is typically applied to anomaly detection problems. We present an empirical investigation of the generalization capability of the Hamming negative selection, when combined with the r-chunk affinity metric. Our investigations reveal that when using the r-chunk metric, the length

r

is a crucial parameter and is inextricably linked to the input data being analyzed. Moreover, we propose that input data with different characteristics, i.e. different positional biases, can result in an incorrect generalization effect.

Thomas Stibor, Jonathan Timmis, Claudia Eckert
How Can We Simulate Something As Complex As the Immune System?

We first establish the potential usefulness of simulation in immunological research, and then explore some of the problems that are preventing its widespread use. We suggest solutions for each of these problems, and illustrate both problems and solutions with an example from our own research – an experiment that tests a novel theory of immunological memory, in which our simulation effectively closed the experiment-theorise loop.

Simon Garrett, Martin Robbins
A Parallel Immune Algorithm for Global Optimization

This research paper presents a parallel immune algorithm, par-IA, using the LAM/MPI library to tackle global numerical optimization problems. par-IA has been compared with two important clonal selection algorithms, CLONALG and opt-IA, and a well-known evolutionary algorithm for function optimization, FEP. The experimental results show a global better performance of par-IA with respect to optIA, CLONALG, and FEP. Considering the results obtained, we can claim that par-IA is a robust immune algorithm for effectively performing global optimization tasks.

Vincenzo Cutello, Giuseppe Nicosia, Emilio Pavia

Invited Session: Data Mining – Algorithms and Applications

Frontmatter
Data Mining Approach to Classification of Archaeological Aerial Photographs

Aerial archaeology plays an important role in the detection and documentation of archaeological sites, which often cannot be easily seen from the ground. It is a quick way to survey large areas, but requires a lot of error-prone human work to analyze it afterwards. In this paper we utilize some of the best-performing image processing and data mining methods to develop a system capable of an accurate automated classification of such aerial photographs. The system consists of phases of image indexing, rough image segmentation, feature extraction, feature grouping and building the classifier. We present the results of experiments conducted on a real set of archaeological and non-archaeological aerial photographs and conclude with perspectives for future work.

Łukasz Kobyliński, Krzysztof Walczak
Hierarchical Document Clustering Using Frequent Closed Sets

Aerial archaeology plays an important role in the detection and documentation of archaeological sites, which often cannot be easily seen from the ground. It is a quick way to survey large areas, but requires a lot of error-prone human work to analyze it afterwards. In this paper we utilize some of the best-performing image processing and data mining methods to develop a system capable of an accurate automated classification of such aerial photographs. The system consists of phases of image indexing, rough image segmentation, feature extraction, feature grouping and building the classifier. We present the results of experiments conducted on a real set of archaeological and non-archaeological aerial photographs and conclude with perspectives for future work.

Marzena Kryszkiewicz, Łukasz Skonieczny
Mining Spatial Association Rules with No Distance Parameter

The paper focuses on finding spatial association rules. A new approach to mining spatial association rules is proposed. The neighborhood is defined in terms of the Delaunay diagrams, instead of predefining distance thresholds with extra runs. Once a Delaunay diagram is created, it is used for determining neighborhoods, and then, based on this knowledge it continues with finding association rules.

Robert Bembenik, Henryk Rybiński

Invited Session: Fundamental Tools for the Lexical and Morphosyntactic Processing of Polish

Frontmatter
Morfeusz — a Practical Tool for the Morphological Analysis of Polish

This paper describes a morphological analyser for Polish. Its features include a large dictionary, a carefully designed tagset, presentation of results as a DAG of interpretations, high efficiency, and free availability for non-commercial use and scientific research.

Marcin Woliński
Domain–Driven Automatic Spelling Correction for Mammography Reports

The paper presents a program for automatic spelling correction of texts from a very specific domain, which has been applied to mammography reports. We describe different types of errors and present the program of correction based on the Levenshtein distance and probability of bigrams.

Agnieszka Mykowiecka, Małgorzata Marciniak
Reductionistic, Tree and Rule Based Tagger for Polish

The paper presents an approach to tagging of Polish based on the combination of handmade reduction rules and selecting rules acquired by Induction of Decision Trees. The general open architecture of the tagger is presented, where the overall process of tagging is divided into subsequent steps and the overall problem is reduced to subproblems of ambiguity classes. A special language of constraints and the use of constraints as elements of decision trees are described. The results of the experiments performed with the tagger are also presented.

Maciej Piasecki, Grzegorz Godlewski
Backmatter
Metadaten
Titel
Intelligent Information Processing and Web Mining
herausgegeben von
Mieczysław A. Kłopotek
Sławomir T. Wierzchoń
Krzysztof Trojanowski
Copyright-Jahr
2006
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-33521-4
Print ISBN
978-3-540-33520-7
DOI
https://doi.org/10.1007/3-540-33521-8