main-content

This book constitutes the refereed proceedings of the 16th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES 2012, held in San Sebastian, Spain, in September 2012.

The 20 revised full papers presented were carefully reviewed and selected from 130 submissions. The papers are organized in topical sections on bioinspired and machine learning methods, machine learning applications, semantics and ontology based techniques, and lattice computing and games.

### Investigation of Random Subspace and Random Forest Regression Models Using Data with Injected Noise

Abstract
The ensemble machine learning methods incorporating random subspace and random forest employing genetic fuzzy rule-based systems as base learning algorithms were developed in Matlab environment. The methods were applied to the real-world regression problem of predicting the prices of residential premises based on historical data of sales/purchase transactions. The accuracy of ensembles generated by the proposed methods was compared with bagging, repeated holdout, and repeated cross-validation models. The tests were made for four levels of noise injected into the benchmark datasets. The analysis of the results was performed using statistical methodology including nonparametric tests followed by post-hoc procedures designed especially for multiple N×N comparisons.
Tadeusz Lasota, Zbigniew Telec, Bogdan Trawiński, Grzegorz Trawiński

### A Genetic Algorithm vs. Local Search Methods for Solving the Orienteering Problem in Large Networks

Abstract
The Orienteering problem (OP) can be modelled as a weighted graph with set of vertices where each has a score. The main OP goal is to find a route that maximises the sum of scores, in addition the length of the route not exceeded the given limit. In this paper we present our genetic algorithm (GA) with inserting as well as removing mutation solving the OP. We compare our results with other local search methods such as: the greedy randomised adaptive search procedure (GRASP) (in addition with path relinking (PR)) and the guided local search method (GLS). The computer experiments have been conducted on the large transport network (908 cities in Poland). They indicate that our algorithm gives better results and is significantly faster than the mentioned local search methods.
Joanna Karbowska-Chilińska, Paweł Zabielski

### Dynamic Structure of Volterra-Wiener Filter for Reference Signal Cancellation in Passive Radar

Abstract
In the article a possibility of using the Volterra-Wiener filter for reference signal elimination in passive radar was considered. The recursive nonlinear orthogonal filter algorithms (with low-complexity and dynamic structures) were developed and implemented within Matlab environment. The results of testing with real-life data are comparable with the effects of the NLMS filter algorithm employment.
Pawel Biernacki

### A Web Browsing Cognitive Model

Abstract
Web usage have been studied from the point of view of machine learning. Although web usage prediction are mostly restricted to an static web site structure, hence this results to be a hard restriction to accomplish in the practice. We propose a decision-making model that allow predicting web users’ navigation choices even in dynamics web sites. We propose a neurophysiological theory of web browsing decision making, which is based on the Leaky Competing Accumulator (LCA). The model is stochastic and has been studied in the context of Psychology for many years. Choices are performed to follow hyperlink according to user text preferences. This process is repeated until the web user decide to leave the web site. Model’s parameters are required to be fitted in order to perform Monte Carlo simulations. It has been observed that nearly 73% of the real distribution is recovered by this method.
Pablo E. Román, Juan D. Velásquez

### Optimization of Approximate Decision Rules Relative to Number of Misclassifications: Comparison of Greedy and Dynamic Programming Approaches

Abstract
In the paper, we present a comparison of dynamic programming and greedy approaches for construction and optimization of approximate decision rules relative to the number of misclassifications. We use an uncertainty measure that is a difference between the number of rows in a decision table T and the number of rows with the most common decision for T. For a nonnegative real number γ, we consider γ-decision rules that localize rows in subtables of T with uncertainty at most γ. Experimental results with decision tables from the UCI Machine Learning Repository are also presented.
Talha Amin, Igor Chikalov, Mikhail Moshkov, Beata Zielosko

### Set-Based Detection and Isolation of Intersampled Delays and Pocket Dropouts in Networked Control

Abstract
This paper focuses on the set-based method for detection and isolation of the network-induced time-delays and packet dropouts. We particularly pay attention to the systems described by a linear discrete-time equation affected by additive disturbances, controlled via redundant communication channels prone to intersampled time-delays and packet dropouts. Time-delays and packet dropouts are considered as faults from a classical control theory perspective. We will show that fault detection and isolation can be achieved indirectly through the separation of positively invariant sets that correspond to the closed-loop dynamics with healthy and faulty communication channels. For this purpose, in order to provide a reference signal that uniquely guarantees the fault detection and isolation in real time, we design a reference governor using receding horizon technique.
Nikola Stanković, Sorin Olaru, Silviu-Iulian Niculescu

### Analysis and Synthesis of the System for Processing of Sign Language Gestures and Translatation of Mimic Subcode in Communication with Deaf People

Abstract
In the paper a design and principle of operation of the system facilitating communication between hearing and deaf people is presented. The system has a modular architecture and consists of main application, translation server and two complementary databases. The main application is responsible for interaction with the user and visualization of the sign language gestures. The translation server carries out translation of the text written in the Polish language to the appropriate messages of the sign language. The translation server is composed of facts database and translation rules implemented in the Prolog language. The facts database contains the set of the lexemes and their inflected forms with a description of the semantics of units. The translation rules carry out identification and analysis of basic structures of the Polish language sentence. These basic structures are related to the sentence creation function of the verb predicate. On the basis of this analysis equivalent translation of text into the sign language is realized. Translated text in the form of metadata is passed to the main application, where it is translated into the appropriate gestures of the sign language and face mimicry. The gestures in the form of 3d vectors and face mimicry are stored in the main database as binary objects. The authors intend to apply the translation system in various public institutions like hospitals, clinics, post offices, schools and offices.
Wojciech Koziol, Hubert Wojtowicz, Kazimierz Sikora, Wieslaw Wajs

### Prediction of Software Quality Based on Variables from the Development Process

Abstract
Since the arising of software engineering many efforts have been devoted to improve the software development process. More recently, software quality has received attention from researchers due to the importance that software has gained in supporting all levels of the organizations. New methods, techniques, and tools were created to increase the quality and productivity of the software development process. Approaches based on the practitioners’ experience, for example, or on the analysis of the data generated during the development process, have been adopted. This paper follows the second path by applying data mining procedures to figure out variables from the development process that most affect the software quality. The premise is that the quality of decision making in management of software projects is closely related to information gathered during the development process. A case study is presented in which some regression models were built to explore this idea during the phases of testing, approval, and production. The results can be applied, mainly, to help the development managers in focusing those variables to improve the quality of the software as a final product.
Hércules Antonio do Prado, Fábio Bianchi Campos, Edilson Ferneda, Nildo Nunes Cornelio, Aluizio Haendchen Filho

### Mining High Performance Managers Based on the Results of Psychological Tests

Abstract
Selecting high performance managers represents a risky task mainly due to the costs involved in a wrong choice. This fact led to the development of many approaches to select the candidates that best fit into the requirements of a certain position. However, defining what are the most important features that condition a good personnel performance is still a problem. In this paper, we discuss an approach, based on data mining techniques, to help managers in this process. We built a classifier, based in the Combinatorial Neural Model (CNM), taking as dependent variable the performance of managers as observed along their careers. As independent variables, we considered the results of wellknown psychological tests (MBTI and DISC). The rules generated by CNM enabled the arising of interesting relations between the psychological profile of managers in their start point in the company and the quality of their work after some years in action. These rules are expected to support the improvement of the selection process by driving the choice of candidates to those with a best prospective. Also, the adequate allocation of people - the right professional in the right place - shall be improved.
Edilson Ferneda, Hércules Antonio do Prado, Alexandre G. Cancian Sobrinho, Remis Balaniuk

### Semantics Preservation in Schema Mappings within Data Exchange Systems

Abstract
We discuss the problem of preservation of semantics of data in data exchange systems. Semantics of data is defined in a source database schema by means of integrity constraints and we expect that the semantics will be preserved after transforming the data into a target database. The transformation is defined by means of schema mappings. In order to reason about soundness and completeness of such mappings with respect to semantics preservation, we propose a method of developing a knowledge base capturing both databases and mappings in a data exchange system.Then formal reasoning about consistency of schema mappings with integrity constraints of databases and with knowledge of application domain can be performed.

### Multi-Relational Learning for Recommendation of Matches between Semantic Structures

Abstract
The paper presents the Tensor-based Reflective Relational Learning System (TRRLS) as a tensor-based approach to automatic recommendation of matches between nodes of semantic structures. The system may be seen as realizing a probabilistic inference with regard to the relation representing the ‘semantic equivalence’ of ontology classes. Despite the fact that TRRLS is based on the new idea of algebraic modeling of multi-relational data, it provides results that are comparable to those achieved by the leading solutions of the Ontology Alignment Evaluation Initiative (OAEI) contest realizing the task of matching concepts of Anatomy track ontologies on the basis of partially known expert matches.
Andrzej Szwabe, Pawel Misiorek, Przemyslaw Walkowiak

### Semantically Enhanced Text Stemmer (SETS) for Cross-Domain Document Clustering

Abstract
This paper focuses on processing cross-domain document repositories, which is challenged by the word ambiguity and the fact that monosemic words are more domain-oriented than polysemic ones. The paper describes a semantically enhanced text normalization algorithm (SETS) aimed at improving document clustering and investigates the performance of the sk-means clustering algorithm across domains by comparing the cluster coherence produced with semantic-based and traditional (TF-IDF-based) document representations. The evaluation is conducted on 20 generic sub-domains of a thousand documents each randomly selected from the Reuters21578 corpus. The experimental results demonstrate improved coherence of the clusters produced by SETS compared to the text normalization obtained with the Porter stemmer. In addition, semantic-based text normalization is shown to be resistant to noise, which is often introduced in the index aggregation stage.
Ivan Stankov, Diman Todorov, Rossitza Setchi

### Ontology Recomposition

Abstract
Electronic devices provide many different kinds of simple services. Most of them is able to interoperate with each other. In order to provide sufficient quality of service discovery and composition, ontologies that are used to describe services should contain many concepts and relations, which makes them big and thus unsuitable to be processed mobile devices. To fully exploit the power of a service-oriented approach in a pervasive computing environment, a technique of runtime ontology recomposition is proposed. Service descriptions created from a common ontology are composed into a runtime version of the ontology. This runtime ontology is then used to resolve service queries instead of the original one. The descriptions’ semantic richness influences the quality of service discovery and composition. In order to assess their most favourable richness the original ontology is used.
Tomasz Rybicki

### Structured Construction of Knowledge Repositories with MetaConcept

Abstract
One of the main issues during the development of a knowledge repository for the Technical Area of Architecture, called KOC, was the organization and structuration of this knowledge domain: the ArCo ontology. Although powerful tools to write ontologies exist, they do not guide their development. MetaConcept answers this need by defining the concepts that describe the structure of the ontology and its construction process: first a meta-ontology for the domain, then the ontology itself and then the knowledge repository. Two examples are presented, KOC 2.0, a data repository for real-world experiences in building construction and its knowledge repository, and ELLES, a repository of lessons learned in the domain of software construction projects.
Germán Bravo

### Association between Teleology and Knowledge Mapping

Abstract
Knowledge mapping can be performed in many different ways on the web, today. One of them, and increasingly popular, are social networks where people can share various content among friends, in user groups or with public. However, the majority of these networks are for socializing and entertainment. To emphasize the work and results we use teleology, a branch of philosophy, that claims that each object has a final purpose. In this paper, we introduce a framework for knowledge mapping where teleological perspective of knowledge is mapped and predefined ontology provides its context. Thus, we are able to design a social network for strategic knowledge mapping, useful to track objectives and areas of interest in academic as well as in business environment.
Pavel Krbálek, Miloš Vacek

### Boosting Retrieval of Digital Spoken Content

Abstract
Every day, the Internet expands as millions of new multimedia objects are uploaded in the form of audio, video and images. While traditional text-based content is indexed by search engines, this indexing cannot be applied to audio and video objects, resulting in a plethora of multimedia content that is inaccessible to a majority of online users. To address this issue, we introduce a technique of automatic, semantically enhanced, description generation for multimedia content. The objective is to facilitate indexing and retrieval of the objects with the help of traditional search engines. Essentially, the technique generates static Web pages automatically, which describe the content of the digital audio and video objects. These descriptions are then organized in such a way as to facilitate locating corresponding audio and video segments. The technique employs a combination of Web services and concurrently provides description translation and semantic enhancement. Thorough analysis of the click-data, comparing accesses to the digital content before and after automatic description generation, suggests a significant increase in the number of retrieval items. This outcome, however is not limited to the terms of visibility, but in supporting multilingual access, additionally decreases the number of language barriers.
Bernardo Pereira Nunes, Alexander Mera, Marco A. Casanova, Ricardo Kawase

### Constructing the Integral OLAP-Model for Scientific Activities Based on FCA

Abstract
In this paper an original approach to analytical decision making support based on on-line analytical processing of multidimensional data is suggested. According to Dr. Codd’s rules, the effectiveness of data analysis significantly depends on the data accessibility and transparency of an analytical model of domain. The method of constructing a conceptual OLAP-model as an integral analytical model of the domain is proposed. The method is illustrated by the example of the scientific activities domain. The integral analytical model includes all possible combinations of analyzed objects and gives them the opportunity to be manipulated ad-hoc. The suggested method consists in a formal concept analysis of measures and dimensions based on an expert knowledge about the structure of analyzing objects and their comparability. As a result, conceptual OLAP-model is represented as a concept lattice of multidimensional cubes. Concept lattice features allow the decision maker to discover the nonstandard analytical dependencies on the set of all actual measures and dimensions of the scientific activities domain. Conceptual OLAP-model implementation allows user makes better decisions based on on-line analytical processing of the scientific activity indicators.
Tatiana Penkova, Anna Korobko

Abstract
The paper proposes an extension of the modal logic $${\cal AG}_n$$ with operators for reasoning about different types of strategies which agents may adopt in order to win a dialogue game. We model agent communication using the paradigm of formal systems of dialogues and in particular, a system proposed by Prakken. In the paper, the traditional notion of a winning strategy is extended with a notion of a strategy giving a chance for success and a notion of a strategy giving a particular degree of chances for victory. Then, using the framework of Alternating-time Temporal Logic (ATL) we specify $${\cal AG}_n$$ operators which allow the investigation of the dialogical strategies.
Magdalena Kacprzak, Katarzyna Budzynska

### Low–Cost Computer Vision Based Automatic Scoring of Shooting Targets

Abstract
This paper introduces an automatic scoring algorithm on shooting target based on computer vision techniques. As opposed to professional solutions, proposed system requires no additional equipment and relies solely on existing straightforward image processing such as the Prewitt edge detection and the Hough transformation. Experimental results show that the method can obtain high quality scoring. The proposed algorithm detects holes with 99 percent, resulting in 92 percent after eliminating false positives. The average error on the automatic score estimation is 0.05 points. The estimation error for over 91 percent holes is lower than a tournament–scoring threshold. Therefore the system can be suitable for amateur shooters interested in professional (tournament-grade) accuracy.
Jacek Rudzinski, Marcin Luckner

### Using Bookmaker Odds to Predict the Final Result of Football Matches

Abstract
There are many online bookmakers that allow betting money in virtually every field of sports, from football to chess. The vast majority of online bookmakers operate based on standard principles and establish the odds for sporting events. These odds constantly change due to bets placed by gamblers. The amount of changes is associated with the amount of money bet on a given odd. The purpose of this paper was to investigate the possibility of predicting how upcoming football matches will end based on changes in bookmaker odds. A number of different classifiers that predict the final result of a football match were developed. The results obtained confirm that the knowledge of a group of people about football matches gathered in the form of bookmaker odds can be successfully used for predicting the final result.
Karol Odachowski, Jacek Grekow