nach oben

2005 | Buch

Kapitel lesen Erstes Kapitel lesen

Transactions on Rough Sets IV

herausgegeben von: James F. Peters, Andrzej Skowron

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Volume IV of the Transactions on Rough Sets (TRS) introduces a number of new advances in the theory and application of rough sets. Rough sets and - proximationspaceswereintroducedmorethan30yearsagobyZdzis lawPawlak. These advances have profound implications in a number of research areas such as the foundations of rough sets, approximate reasoning, arti?cial intelligence, bioinformatics,computationalintelligence, cognitivescience, intelligentsystems, datamining,machineintelligence,andsecurity. Inaddition,itisevidentfromthe papers included in this volume that the foundations and applications of rough sets is a very active research area worldwide. A total of 16 researchers from 7 countries are represented in this volume, namely, Canada, India, Norway, S- den, Poland, Russia and the United States of America. Evidence of the vigor, breadth and depth of research in the theory and applications of rough sets can be found in the 10 articles in this volume. Prof. Pawlak has contributed a treatise on the philosophical underpinnings of rough sets. In this treatise, observations are made about the Cantor notion of a set, antinomies arising from Cantor sets, the problem of vagueness (es- cially, vague (imprecise) concepts), fuzzy sets, rough sets, fuzzy vs. rough sets as well as logic and rough sets. Among the many vistas and research directions suggested by Prof. Pawlak, one of the most fruitful concerns the model for a rough membership function, which was incarnated in many di?erent forms since its introduction by Pawlakand Skowronin 1994. Recall, here, that Prof.

Inhaltsverzeichnis

Frontmatter

Regular Papers

A Treatise on Rough Sets

Abstract

This article presents some general remarks on rough sets and their place in general picture of research on vagueness and uncertainty – concepts of utmost interest, for many years, for philosophers, mathematicians, logicians and recently also for computer scientists and engineers particularly those working in such areas as AI, computational intelligence, intelligent systems, cognitive science, data mining and machine learning. Thus this article is intended to present some philosophical observations rather than to consider technical details or applications of rough set theory. Therefore we also refrain from presentation of many interesting applications and some generalizations of the theory.

Zdzisław Pawlak

On Optimization of Decision Trees

Abstract

In the paper algorithms are considered which allow to consecutively optimize decision trees for decision tables with many-valued decisions relatively different complexity measures such as number of nodes, weighted depth, average weighted depth, etc. For decision tables over an arbitrary infinite restricted information system [5] these algorithms have (at least for the three mentioned measures) polynomial time complexity depending on the length of table description. For decision tables over one of such information systems experimental results of decision tree optimization are described.

Igor V. Chikalov, Mikhail Ju. Moshkov, Maria S. Zelentsova

Dealing with Missing Data: Algorithms Based on Fuzzy Set and Rough Set Theories

Abstract

Missing data, commonly encountered in many fields of study, introduce inaccuracy in the analysis and evaluation. Previous methods used for handling missing data (e.g., deleting cases with incomplete information, or substituting the missing values with estimated mean scores), though simple to implement, are problematic because these methods may result in biased data models. Fortunately, recent advances in theoretical and computational statistics have led to more flexible techniques to deal with the missing data problem. In this paper, we present missing data imputation methods based on clustering, one of the most popular techniques in Knowledge Discovery in Databases (KDD). We combine clustering with soft computing, which tends to be more tolerant of imprecision and uncertainty, and apply fuzzy and rough clustering algorithms to deal with incomplete data. The experiments show that a hybridization of fuzzy set and rough set theories in missing data imputation algorithms leads to the best performance among our four algorithms, i.e., crisp K-means, fuzzy K-means, rough K-means, and rough-fuzzy K-means imputation algorithms.

Dan Li, Jitender Deogun, William Spaulding, Bill Shuart

Characteristic Relations for Incomplete Data: A Generalization of the Indiscernibility Relation

Abstract

This paper shows that attribute-value pair blocks, used for many years in rule induction, may be used as well for computing indiscernibility relations for completely specified decision tables. Much more importantly, for incompletely specified decision tables, i.e., for data with missing attribute values, the same idea of attribute-value pair blocks is a convenient tool to compute characteristic sets, a generalization of equivalence classes of the indiscernibility relation, and also characteristic relations, a generalization of the indiscernibility relation. For incompletely specified decision tables there are three different ways lower and upper approximations may be defined: singleton, subset and concept. Finally, it is shown that, for a given incomplete data set, the set of all characteristic relations for the set of all congruent decision tables is a lattice.

Jerzy W. Grzymala-Busse

Supervised Learning in the Gene Ontology Part I: A Rough Set Framework

Abstract

Prediction of gene function introduces a new learning problem where the decision classes associated with the objects (i.e., genes) are organized in a directed acyclic graph (DAG). Rough set theory, on the other hand, assumes that the classes are unrelated cannot handle this problem properly. To this end, we introduce a new rough set framework. The traditional decision system is extended into DAG decision system which can represent the DAG. From this system we develop several new operators, which can determine the known and the potential objects of a class and show how these sets can be combined with the usual rough set approximations. The properties of these operators are also investigated.

Herman Midelfart

Supervised Learning in the Gene Ontology Part II: A Bottom-Up Algorithm

Abstract

Prediction of gene function for expression profiles introduces a new problem for supervised learning algorithms. The decision classes are taken from an ontology, which defines relationships between the classes. Supervised algorithms, on the other hand, assumes that the classes are unrelated. Hence, we introduce a new algorithm which can take these relationships into account. This is tested on a microarray data set created from human fibroblast cells and on several artificial data sets. Since standard performance measures do not apply to this problem, we also introduce several new measures for measuring classification performance in an ontology.

Herman Midelfart

Comparative Analysis of Deterministic and Nondeterministic Decision Tree Complexity Local Approach

Abstract

For problems over arbitrary information system we study the relationships among the complexity of a problem description, the minimal complexity of a decision tree solving this problem deterministically, and the minimal complexity of a decision tree solving this problem nondeterministically. We consider the local approach to investigation of decision trees where only attributes from a problem description are used for construction of decision trees solving this problem.

Mikhail Ju. Moshkov

A Fast Host-Based Intrusion Detection System Using Rough Set Theory

Abstract

Intrusion Detection system has become the main research focus in the area of information security. Last few years have witnessed a large variety of technique and model to provide increasingly efficient intrusion detection solutions. We advocate here that the intrusive behavior of a process is highly localized characteristics of the process. There are certain smaller episodes in a process that make the process intrusive in an otherwise normal stream. As a result it is unnecessary and most often misleading to consider the whole process in totality and to attempt to characterize its abnormal features. In the present work we establish that subsequences of reasonably small length of sequence of system calls would suffice to identify abnormality in a process. We make use of rough set theory to demonstrate this concept. Rough set theory also facilitates identifying rules for intrusion detection. The main contributions of the paper are the following- (a) It is established that very small subsequence of system call is sufficient to identify intrusive behavior with high accuracy. We demonstrate our result using DARPA’98 BSM data; (b) A rough set based system is developed that can extract rules for intrusion detection; (c) An algorithm is presented that can determine the status of a process as either normal or abnormal on-line.

Sanjay Rawat, V. P. Gulati, Arun K. Pujari

Incremental Learning and Evaluation of Structures of Rough Decision Tables

Abstract

Rough decision tables were introduced by Pawlak in the context of rough set theory. A rough decision table represents, a non-functional in general, relationship between two groups of properties of objects, referred to as condition and decision attributes, respectively. In practical applications, the rough decision tables are normally learned from data. In this process, for better coverage of the domain of interest, they can be structured into hierarchies. To achieve convergence of the learned hierarchy of rough decision tables to a stable final state, it is desirable to avoid total regeneration of the learned structure after new objects, not represented in the hierarchy, are encountered. This can be accomplished through an incremental learning process in which the structure of rough decision tables is updated, rather than regenerated, after new observations appeared. The introduction and the investigation of this incremental learning process within the framework of the rough set model is the main theme of the article. The article is also concerned with evaluation of learned decision tables and their structures by introducing the absolute gain function to measure the quality of information represented by the tables.

Wojciech Ziarko

Dissertations and Monographs

A Framework for Reasoning with Rough Sets

Abstract

Rough sets framework has two appealing aspects. First, it is a mathematical approach to deal with vague concepts. Second, rough set techniques can be used in data analysis to find patterns hidden in the data. The number of applications of rough sets to practical problems in different fields demonstrates the increasing interest in this framework and its applicability. This thesis proposes a language that caters for implicit definitions of rough sets obtained by combining different regions of other rough sets. In this way, concept approximations can be derived by taking into account domain knowledge. A declarative semantics for the language is also discussed. It is then shown that programs in the proposed language can be compiled to extended logic programs under the paraconsistent stable model semantics. The equivalence between the declarative semantics of the language and the declarative semantics of the compiled programs is proved. This transformation provides the computational basis for implementing our ideas. A query language for retrieving information about the concepts represented through the defined rough sets is also discussed. Several motivating applications are described. Finally, an extension of the proposed language with numerical measures is presented. This extension is motivated by the fact that numerical measures are an important aspect in data mining applications.

Aida Vitória

Analogy-Based Reasoning in Classifier Construction

Abstract

Analogy-based reasoning methods in machine learning make it possible to reason about properties of objects on the basis of similarities between objects. A specific similarity based method is the k nearest neighbors (k-nn) classification algorithm. In the k-nn algorithm, a decision about a new object x is inferred on the basis of a fixed number k of the objects most similar to x in a given set of examples. The primary contribution of the dissertation is the introduction of two new classification models based on the k-nn algorithm.

The first model is a hybrid combination of the k-nn algorithm with rule induction. The proposed combination uses minimal consistent rules defined by local reducts of a set of examples. To make this combination possible the model of minimal consistent rules is generalized to a metric-dependent form. An effective polynomial algorithm implementing the classification model based on minimal consistent rules has been proposed by Bazan. We modify this algorithm in such a way that after addition of the modified algorithm to the k-nn algorithm the increase of the computation time is inconsiderable. For some tested classification problems the combined model was significantly more accurate than the classical k-nn classification algorithm.

For many real-life problems it is impossible to induce relevant global mathematical models from available sets of examples. The second model proposed in the dissertation is a method for dealing with such sets based on locally induced metrics. This method adapts the notion of similarity to the properties of a given test object. It makes it possible to select the correct decision in specific fragments of the space of objects. The method with local metrics improved significantly the classification accuracy of methods with global models in the hardest tested problems.

The important issues of quality and efficiency of the k-nn based methods are a similarity measure and the performance time in searching for the most similar objects in a given set of examples, respectively. In this dissertation both issues are studied in detail and some significant improvements are proposed for the similarity measures and for the search methods found in the literature.

Arkadiusz Wojna

Backmatter

Titel: Transactions on Rough Sets IV
herausgegeben von: James F. Peters
Andrzej Skowron
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-32016-6
Print ISBN: 978-3-540-29830-4
DOI: https://doi.org/10.1007/11574798