Top

2004 | Book

Transactions on Rough Sets I

James F. Peters - Andrzej Skowron, Editors-in-Chief

Editors: James F. Peters, Andrzej Skowron, Jerzy W. Grzymała-Busse, Bożena Kostek, Roman W. Świniarski, Marcin S. Szczuka

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

Included in: Professional Book Archive

Frontmatter

Rough Sets – Introduction

Some Issues on Rough Sets

Abstract

The aim of this paper is to give rudiments of rough set theory and present some recent research directions proposed by the author.

Rough set theory is a new mathematical approach to imperfect knowledge.

The problem of imperfect knowledge has been tackled for a long time by philosophers, logicians and mathematicians. Recently it became also a crucial issue for computer scientists, particularly in the area of artificial intelligence. There are many approaches to the problem of how to understand and manipulate imperfect knowledge. The most successful one is, no doubt, the fuzzy set theory proposed by Lotfi Zadeh [1].

Rough set theory proposed by the author in [2] presents still another attempt to this problem. This theory has attracted attention of many researchers and practitioners all over the world, who have contributed essentially to its development and applications. Rough set theory overlaps with many other theories. However we will refrain to discuss these connections here. Despite this, rough set theory may be considered as an independent discipline in its own right.

Rough set theory has found many interesting applications. The rough set approach seems to be of fundamental importance to AI and cognitive sciences, especially in the areas of machine learning, knowledge acquisition, decision analysis, knowledge discovery from databases, expert systems, inductive reasoning and pattern recognition.

Zdzisław Pawlak

Rough Sets – Theory

Learning Rules from Very Large Databases Using Rough Multisets

Abstract

This paper presents a mechanism called LERS-M for learning production rules from very large databases. It can be implemented using object-relational database systems, it can be used for distributed data mining, and it has a structure that matches well with parallel processing. LERS-M is based on rough multisets and it is formulated using relational operations with the objective to be tightly coupled with database systems. The underlying representation used by LERS-M is multiset decision tables, which are derived from information multisystems. In addition, it is shown that multiset decision tables provide a simple way to compute Dempster-Shafer’s basic probability assignment functions from. data sets.

Chien-Chung Chan

Data with Missing Attribute Values: Generalization of Indiscernibility Relation and Rule Induction

Abstract

Data sets, described by decision tables, are incomplete when for some cases (examples, objects) the corresponding attribute values are missing, e.g., are lost or represent “do not care” conditions. This paper shows an extremely useful technique to work with incomplete decision tables using a block of an attribute-value pair. Incomplete decision tables are described by characteristic relations in the same way complete decision tables are described by indiscernibility relations. These characteristic relations are conveniently determined by blocks of attribute-value pairs. Three different kinds of lower and upper approximations for incomplete decision tables may be easily computed from characteristic relations. All three definitions are reduced to the same definition of the indiscernibility relation when the decision table is complete. This paper shows how to induce certain and possible rules for incomplete decision tables using MLEM2, an outgrow of the rule induction algorithm LEM2, again, using blocks of attribute-value pairs. Additionally, the MLEM2 may induce rules from incomplete decision tables with numerical attributes as well.

Jerzy W. Grzymala-Busse

Generalizations of Rough Sets and Rule Extraction

Abstract

In this paper, two kinds of generalizations of rough sets are proposed based on two different interpretations of rough sets: one is an interpretation of rough sets as approximation of a set by means of elementary sets and the other is an interpretation of rough sets as classification of objects into three different classes, i.e., positive objects, negative objects and boundary objects. Under each interpretation, two different definitions of rough sets are given depending on the problem setting. The fundamental properties are shown. The relations between generalized rough sets are given. Moreover, rule extraction underlying each rough set is discussed. It is shown that rules are extracted based on modified decision matrices. A simple example is given to show the differences in the extracted rules by underlying rough sets.

Masahiro Inuiguchi

Towards Scalable Algorithms for Discovering Rough Set Reducts

Abstract

Rough set theory allows one to find reducts from a decision table, which are minimal sets of attributes preserving the required quality of classification. In this article, we propose a number of algorithms for discovering all generalized reducts (preserving generalized decisions), all possible reducts (preserving upper approximations) and certain reducts (preserving lower approximations). The new RAD and CoreRAD algorithms, we propose, discover exact reducts. They require, however, the determination of all maximal attribute sets that are not supersets of reducts. In the case, when their determination is infeasible, we propose GRA and CoreGRA algorithms, which search approximate reducts. These two algorithms are well suited to the discovery of supersets of reducts from very large decision tables.

Marzena Kryszkiewicz, Katarzyna Cichoń

Variable Precision Fuzzy Rough Sets

Abstract

In this paper the variable precision fuzzy rough sets (VPFRS) concept will be considered. The notions of the fuzzy inclusion set and the α-inclusion error based on the residual implicators will be introduced. The level of misclassification will be expressed by means of α-cuts of the fuzzy inclusion set. Next, the use of the mean fuzzy rough approximations will be postulated and discussed. The concept of VPFRS will be defined using the extended version of the variable precision rough sets (VPRS) model, which utilises a general allowance for levels of misclassification expressed by two parameters: lower (l) and upper (u) limit. Remarks concerning the variable precision rough fuzzy sets (VPRFS) idea will be given. An example will illustrate the proposed VPFRS model.

Alicja Mieszkowicz-Rolka, Leszek Rolka

Greedy Algorithm of Decision Tree Construction for Real Data Tables

Abstract

In the paper a greedy algorithm for minimization of decision tree depth is described and bounds on the algorithm precision are considered. This algorithm is applicable to data tables with both discrete and continuous variables which can have missing values. Under some natural assumptions on the class NP and on the class of considered tables, the algorithm is, apparently, close to best approximate polynomial algorithms for minimization of decision tree depth.

Mikhail Ju. Moshkov

Consistency Measures for Conflict Profiles

Abstract

The formal definition of conflict was formulated and analyzed by Pawlak. In Pawlak’s works the author presented the concept and structure of conflicts. In this concept a conflict may be represented by an information system (U,A), where U is a set of agents taking part in conflict and A is a set of attributes representing conflict issues. On the basis of the information system Pawlak defined also various measures describing conflicts, for example the measure of military potential of the conflict sites. Next the concept has been developed by other authors. In their works the authors defined a multi-valued structure of conflict and proposed using consensus methods to their solving. In this paper the authors present the definition of consistency functions which should enable to measure the degree of consistency of conflict profiles. A conflict profile is defined as a set of opinions of agents referring to the subject of the conflict. Using this degree one could make choice of the method for solving the conflict, for example, a negotiation method or a consensus method. A set of postulates for consistency functions are defined and analyzed. Besides, some concrete consistency functions are formulated and their properties referring to postulates are included.

Ngoc Thanh Nguyen, Michal Malowiecki

Layered Learning for Concept Synthesis

Abstract

We present a hierarchical scheme for synthesis of concept approximations based on given data and domain knowledge. We also propose a solution, founded on rough set theory, to the problem of constructing the approximation of higher level concepts by composing the approximation of lower level concepts. We examine the effectiveness of the layered learning approach by comparing it with the standard learning approach. Experiments are carried out on artificial data sets generated by a road traffic simulator.

Sinh Hoa Nguyen, Jan Bazan, Andrzej Skowron, Hung Son Nguyen

Basic Algorithms and Tools for Rough Non-deterministic Information Analysis

Abstract

Roughnon-deterministicinformation analysis is a framework for handling the rough sets based concepts, which are defined in not only DISs (DeterministicInformation Systems) but also NISs (Non-deterministicInformation Systems), on computers. NISs were proposed for dealing with information incompleteness in DISs. In this paper, two modalities, i.e., the certainty and the possibility, are defined for each concept like the definability of a set, the consistency of an object, data dependency, rule generation, reduction of attributes, criterion of rules support, accuracy and coverage. Then, each algorithm for computing two modalities is investigated. An important problem is how to compute two modalities depending upon all derived DISs. A simple method, such that two modalities are sequentially computed in all derived DISs, is not suitable. Because the number of all derived DISs increases in exponential order. This problem is uniformly solved by means of applying either inf and sup information or possibleequivalence relations. An information analysis tool for NISs is also presented.

Hiroshi Sakai, Akimichi Okuma

A Partition Model of Granular Computing

Abstract

There are two objectives of this chapter. One objective is to examine the basic principles and issues of granular computing. We focus on the tasks of granulation and computing with granules. From semantic and algorithmic perspectives, we study the construction, interpretation, and representation of granules, as well as principles and operations of computing and reasoning with granules. The other objective is to study a partition model of granular computing in a set-theoretic setting. The model is based on the assumption that a finite set of universe is granulated through a family of pairwise disjoint subsets. A hierarchy of granulations is modeled by the notion of the partition lattice. The model is developed by combining, reformulating, and reinterpreting notions and results from several related fields, including theories of granularity, abstraction and generalization (artificial intelligence), partition models of databases, coarsening and refining operations (evidential theory), set approximations (rough set theory), and the quotient space theory for problem solving.

Yiyu Yao

Rough Sets – Applications

Musical Phrase Representation and Recognition by Means of Neural Networks and Rough Sets

Abstract

This paper discusses various musical phrase representations that can be used to classify musical phrases with a considerable accuracy. Musical phrase analysis plays an important role in music information retrieval domain. In the paper various representations of a musical phrase are described and analyzed. Also the experiments were designed to facilitate pitch prediction within a musical phrase by means of entropy-coding of music. We used the concept of predictive data coding introduced by Shannon. Encoded music representations, stored in the database, are then used for automatic recognition of musical phrases by means of Neural Networks (NN) and rough sets (RS). A discussion on obtained results is carried out and conclusions are included.

Andrzej Czyzewski, Marek Szczerba, Bozena Kostek

Processing of Musical Metadata Employing Pawlak’s Flow Graphs

Abstract

The objective of the presented research is enabling music retrieval based on intelligent analysis of metadata contained in musical databases. A database was constructed for the purpose of this study including textual data related to approximately 500 compact discs representing various categories of music. The description format of musical recordings stored in the database is compatible to the format of the widely-used CDDB database available in the Internet. An advanced query algorithm was prepared employing the concept of inference rule derivation from flow graphs introduced recently by Pawlak. The created database searching engine utilizes knowledge acquired in advance and stored in flow graphs in order to enable searching CD records.

Bozena Kostek, Andrzej Czyzewski

Data Decomposition and Decision Rule Joining for Classification of Data with Missing Values

Abstract

In this paper we present a new approach to handling incomplete information and classifier complexity reduction. We describe a method, called D ³ RJ, that performs data decomposition and decision rule joining to avoid the necessity of reasoning with missing attribute values. In the consequence more complex reasoning process is needed than in the case of known algorithms for induction of decision rules. The original incomplete data table is decomposed into sub-tables without missing values. Next, methods for induction of decision rules are applied to these sets. Finally, an algorithm for decision rule joining is used to obtain the final rule set from partial rule sets. Using D³RJ method it is possible to obtain smaller set of rules and next better classification accuracy than classic decision rule induction methods. We provide an empirical evaluation of the D³RJ method accuracy and model size on data with missing values of natural origin.

Rafał Latkowski, Michał Mikołajczyk

Rough Sets and Relational Learning

Abstract

Rough Set Theory is a mathematical tool to deal with vagueness and uncertainty. Rough Set Theory uses a single information table. Relational Learning is the learning from multiple relations or tables. Recent research in Rough Set Theory includes the extension of Rough Set Theory to Relational Learning. A brief overview of the work in Rough Sets and Relational Learning is presented.

The authors’ work in this area is then presented. Inductive Logic Programming (ILP) is one of the main approaches to Relational Learning. The generic Rough Set Inductive Logic Programming model introduces a rough setting in ILP. The Variable Precision Rough Set Inductive Logic Programming model (VPRSILP model) extends the Variable Precision Rough Set model to ILP.

In the cVPRSILP approach based on the VPRSILP model, elementary sets are defined using attributes that are based on a finite number of clauses of interest. However, this results in the number of elementary sets being very large. So, only significant elementary sets are used, and test cases are classified based on their proximity to the significant elementary sets.

The utility of this approach is shown in classification experiments in predictive toxicology.

R. S. Milton, V. Uma Maheswari, Arul Siromoney

Approximation Space for Software Models

Abstract

This article introduces an approximation space for graded acceptance of proposed software models for system design relative to design patterns that conform to a system design standard. A fundamental problem in system design is that feature values extracted from experimental design models tend not to match exactly patterns associated with standard design models. It is not generally known how to measure the extent that a particular system design conforms to a standard design pattern. The rough set approach introduced by Zdzislaw Pawlak provides a ground for concluding to what degree a particular model for a system design is a part of a set of a set of models representing a standard. To some extent, this research takes into account observations made by Christopher Alexander about the idea of form in Plato’s philosophy, which is helpful in arriving at an understanding about design patterns used to classify similar models for system designs. The basic assumption made in this research is that every system design can be approximated relative to a standard, and it is possible to prescribe conditions for the construction of a set of acceptable software models. The template method and memento behavioral design patterns are briefly considered by way of illustration. An approximation space for software models is introduced. In addition, an approach to the satisfaction- based classification of software models is also presented.

James F. Peters, Sheela Ramanna

Application of Rough Sets to Environmental Engineering Models

Abstract

Rough Sets is a method of dealing with domains characterised by inconsistent and incomplete information. We apply this method to the problem of Environmental Engineering modelling. The solid waste management problem, in particular, is the subject of our analysis.

Modelling large engineering problems is difficult because of the volume of information processed and the number of modelling decisions being made. In many cases, a chicken and the egg problem presents itself when modelling new or one-of-a-kind systems: The model is needed to gain the knowledge necessary for constructing the model.

The generally accepted solution is to iteratively verify the importance of a parameter or model change until a concise model is created which appropriately supports the decision making process. We improve on this process by using Rough Sets to actively search for simplifying assumptions of the model and validate the process using a municipal solid waste management case.

Robert H. Warren, Julia A. Johnson, Gordon H. Huang

Rough Set Theory and Decision Rules in Data Analysis of Breast Cancer Patients

Abstract

In this paper an approach based on the rough set theory and induction of decision rules is applied to analyse relationships between condition attributes describing breast cancer patients and their treatment results. The data set contains 228 breast cancer patients described by 16 attributes and is divided into two classes: the 1st class – patients who had not experienced cancer recurrence; the 2nd class – patients who had cancer recurrence. In the first phase of the analysis, the rough sets based approach is applied to determine attribute importance for the patients’ classification. The set of selected attributes, which ensured high quality of the classification, was obtained. Then, the decision rules were generated by means of the algorithm inducting the minimal cover of the learning examples. The usefulness of these rules for predicting therapy results was evaluated by means of the cross-validation technique. Moreover, the syntax of selected rules was interpreted by physicians. Proceeding in this way, they formulated some indications, which may be helpful in making decisions referring to the treatment of breast cancer patients. To sum up, this paper presents a case study of applying rough sets theory to analyse medical data.

Jerzy Załuski, Renata Szoszkiewicz, Jerzy Krysiński, Jerzy Stefanowski

Independent Component Analysis, Principal Component Analysis and Rough Sets in Face Recognition

Abstract

The paper contains description of hybrid methods of face recognition which are based on independent component analysis, principal component analysis and rough set theory. The feature extraction and pattern forming from face images have been provided using Independent Component Analysis and Principal Component Analysis. The feature selection/reduction has been realized using the rough set technique. The face recognition system was designed as rough-sets rule based classifier.

Roman W. Świniarski, Andrzej Skowron

Backmatter

Title: Transactions on Rough Sets I
Editors: James F. Peters
Andrzej Skowron
Jerzy W. Grzymała-Busse
Bożena Kostek
Roman W. Świniarski
Marcin S. Szczuka
Publisher: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-27794-1
Print ISBN: 978-3-540-22374-0
DOI: https://doi.org/10.1007/b98175

Springer Professional

Transactions on Rough Sets I

James F. Peters - Andrzej Skowron, Editors-in-Chief

Table of Contents

Frontmatter

Rough Sets – Introduction

Some Issues on Rough Sets

Rough Sets – Theory

Learning Rules from Very Large Databases Using Rough Multisets

Data with Missing Attribute Values: Generalization of Indiscernibility Relation and Rule Induction

Generalizations of Rough Sets and Rule Extraction

Towards Scalable Algorithms for Discovering Rough Set Reducts

Variable Precision Fuzzy Rough Sets

Greedy Algorithm of Decision Tree Construction for Real Data Tables

Consistency Measures for Conflict Profiles

Layered Learning for Concept Synthesis

Basic Algorithms and Tools for Rough Non-deterministic Information Analysis

A Partition Model of Granular Computing

Rough Sets – Applications

Musical Phrase Representation and Recognition by Means of Neural Networks and Rough Sets

Processing of Musical Metadata Employing Pawlak’s Flow Graphs

Data Decomposition and Decision Rule Joining for Classification of Data with Missing Values

Rough Sets and Relational Learning

Approximation Space for Software Models

Application of Rough Sets to Environmental Engineering Models

Rough Set Theory and Decision Rules in Data Analysis of Breast Cancer Patients

Independent Component Analysis, Principal Component Analysis and Rough Sets in Face Recognition

Backmatter