Vagueness and Roughness

Abstract

The paper proposes a new formal approach to vagueness and vague sets taking inspirations from Pawlak’s rough set theory. Following a brief introduction to the problem of vagueness, an approach to conceptualization and representation of vague knowledge is presented from a number of different perspectives: those of logic, set theory, algebra, and computer science. The central notion of the vague set, in relation to the rough set, is defined as a family of sets approximated by the so called lower and upper limits. The family is simultaneously considered as a family of all denotations of sharp terms representing a suitable vague term, from the agent’s point of view. Some algebraic operations on vague sets and their properties are defined. Some important conditions concerning the membership relation for vague sets, in connection to Blizard’s multisets and Zadeh’s fuzzy sets, are established as well. A classical outlook on a logic of vague sentences (vague logic) based on vague sets is also discussed.

Zbigniew Bonikowski, Urszula Wybraniec-Skardowska

Modified Indiscernibility Relation in the Theory of Rough Sets with Real-Valued Attributes: Application to Recognition of Fraunhofer Diffraction Patterns

Abstract

The goal of the paper is to present the modification of classical indiscernibility relation, dedicated for rough set theory in a real-valued attributes space. Contrary to some other known generalizations, indiscernibility relation modified here, remains an equivalence relation and it is obtained by introducing a structure into collection of attributes. It defines real-valued subspaces, used in a multidimensional cluster analysis, partitioning the universe in a more natural way, as compared to one-dimensional discretization, iterated in classical model. Since the classical model is a special, extreme case of our modification, the modified version can be considered as more general. But more importantly, it allows for natural processing of real-valued attributes in a rough-set theory, broadening the scope of applications of classical, as well as variable precision rough set model, since the latter can utilize the proposed modification, equally well. In a case study, we show a real application of modified relation, a hybrid, opto-electronic recognizer of Fraunhofer diffraction patterns. Modified rough sets are used in an evolutionary optimization of the optical feature extractor implemented as a holographic ring-wedge detector. The classification is performed by a probabilistic neural network, whose error, assessed in an unbiased way is compared to earlier works.

Krzysztof A. Cyran

On Certain Rough Inclusion Functions

Abstract

In this article we further explore the idea which led to the standard rough inclusion function. As a result, two more rough inclusion functions (RIFs in short) are obtained, different from the standard one and from each other. With every RIF we associate a mapping which is in some sense complementary to it. Next, these complementary mappings (co-RIFs) are used to define certain metrics. As it turns out, one of these distance functions is an instance of the Marczewski–Steinhaus metric. While the distance functions may directly be used to measure the degree of dissimilarity of sets of objects, their complementary mappings – also discussed here – are useful in measuring of the degree of mutual similarity of sets.

Anna Gomolińska

Automatic Rhythm Retrieval from Musical Files

Abstract

This paper presents a comparison of the effectiveness of two computational intelligence approaches applied to the task of retrieving rhythmic structure from musical files. The method proposed by the authors of this paper generates rhythmic levels first, and then uses these levels to compose rhythmic hypotheses. Three phases: creating periods, creating simplified hypotheses and creating full hypotheses are examined within this study. All experiments are conducted on a database of national anthems. Decision systems such as Artificial Neural Networks and Rough Sets are employed to search the metric structure of musical files. This was based on examining physical attributes of sound that are important in determining the placement of a particular sound in the accented location of a musical piece. The results of the experiments show that both decision systems award note duration as the most significant parameter in automatic searching for metric structure of rhythm from musical files. Also, a brief description of the application realizing automatic rhythm accompaniment is presented.

Bożena Kostek, Jarosław Wójcik, Piotr Szczuko

FUN: Fast Discovery of Minimal Sets of Attributes Functionally Determining a Decision Attribute

Abstract

In this paper, we present our Fun algorithm for discovering minimal sets of conditional attributes functionally determining a given dependent attribute. In particular, the algorithm is capable of discovering Rough Sets certain, generalized decision, and membership distribution reducts. Fun can operate either on partitions of objects or alternatively on stripped partitions, which do not store singleton groups. It is capable of using functional dependencies occurring among conditional attributes for pruning candidate dependencies. In this paper, we offer further reduction of stripped partitions, which allows correct determination of minimal functional dependencies provided optional candidate pruning is not carried out. In the paper we consider six variants of Fun, including two new variants using reduced stripped partitions. We have carried out a number of experiments on benchmark data sets to test the efficiency of all variants of Fun. We have also tested the efficiency of the Fun’s variants against the Rosetta and RSES toolkits’ algorithms computing all reducts and against Tane, which is one of the most efficient algorithms computing all minimal functional dependencies. The experiments prove that Fun is up to 3 orders of magnitude faster than the the Rosetta and RSES toolkits’ algorithms and faster than Tane up to 30 times.

Marzena Kryszkiewicz, Piotr Lasek

Information Granulation: A Medical Case Study

Abstract

Granular computing is a multidisciplinary theory rapidly developed in recent years. It provides a conceptual framework for many research fields, among others data mining. Data mining techniques and algorithms focus on knowledge discovery from data. When data labels are unknown one can use methods of exploratory data analysis called clustering algorithms. Clustering algorithms are also useful to find hidden dependencies and patterns in data. In this article granular computing and clustering were implemented in information granulation system SOSIG and applied to exploration of real medical data set. Data granulation in the system can be performed on different levels of resolution. Thereby the granules composed of clusters reflect relationship between objects on distinct levels of details. The clustering in SOSIG is generated automatically - there is no requirement to give a number of groups for division. It eliminates problems present in popular clustering algorithms like selection of correct number of clusters and evaluation of created partitioning. The difficulties are encountered in most partitioning as well as hierarchical methods reducing their practical application.

Additionally, this article contains solution generated by SOSIG in comparison with clustering results of algorithms: k-means, hierarchical, EM and DBSCAN. There are used quality indices such as Dunn’s, DB, CDbw and SI.

Urszula Kużelewska, Jarosław Stepaniuk

Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation

Abstract

Image segmentation is an indispensable process in the visualization of human tissues, particularly during clinical analysis of magnetic resonance (MR) images. In this paper, the rough-fuzzy c-means (RFCM) algorithm is presented for segmentation of brain MR images. The RFCM algorithm comprises a judicious integration of the rough sets, fuzzy sets, and c-means algorithm. While the concept of lower and upper approximations of rough sets deals with vagueness and incompleteness in class definition of brain MR images, the membership function of fuzzy sets enables efficient handling of overlapping classes. The crisp lower bound and fuzzy boundary of a class, introduced in the RFCM algorithm, enable efficient segmentation of brain MR images. One of the major issues of the RFCM based brain MR image segmentation is how to select initial prototypes of different classes or categories. The concept of discriminant analysis, based on the maximization of class separability, is used to circumvent the initialization and local minima problems of the RFCM. Some quantitative indices are introduced to extract local features of brain MR images for accurate segmentation. The effectiveness of the RFCM algorithm, along with a comparison with other related algorithms, is demonstrated on a set of brain MR images.

Pradipta Maji, Sankar K. Pal

Approximation Schemes in Logic and Artificial Intelligence

Abstract

Approximate reasoning is used in a variety of reasoning tasks in logic-based artificial intelligence. In this paper we present several such reasoning schemes and show how they relate and differ from the approach of Pawlak’s Rough Sets.

Victor W. Marek, Mirosław Truszczyński

Decision Rule Based Data Models Using NetTRS System Overview

Abstract

The NetTRS system is a web service that makes induction, evaluation and postprocessing of decision rules possible. The TRS library is the kernel of the system. It allows to induce rules by means of the tolerance rough sets model. The NetTRS makes user interface of the TRS library available in the Internet. The main emphasis of the NetTRS system is placed on induction and postprocessing of decision rules. This article shows the architecture and the functionality of the system. This paper describes also the parameterization of algorithms that are implemented in the TRS library.

Marcin Michalak, Marek Sikora

A Rough Set Based Approach for ECG Classification

Abstract

An inference engine for classification of Electrocardiogram (ECG) signals is developed with the help of a rule based rough set decision system. For this purpose an automated data extraction system from ECG strips is being developed by using a few image processing techniques. A knowledge base is developed after consulting different medical books as well as feedback of reputed cardiologists on interpretation and selection of essential time-plane features of ECG signal. An algorithm for extraction of different time domain features is also developed with the help of differentiation techniques and syntactic approaches. Finally, a rule-based rough set decision system is generated using these time-plane features for the development of an inference engine for disease classification. Two sets of rules are generated for this purpose. The first set is for general separation between normal and diseased subjects. The second set of rules is used for classifications between different diseases.

Sucharita Mitra, M. Mitra, B. B. Chaudhuri

Universal Problem of Attribute Reduction

Abstract

In the paper, some generalizations of the notions of reduct, test (superreduct), partial (approximate) reduct and partial test are considered. The accuracy of greedy algorithm for construction of partial test is investigated. A lower bound on the minimal cardinality of partial reducts based on an information about greedy algorithm work is studied. A bound on the precision of greedy algorithm which does not depend on the number of pairs of rows of a decision table which should be separated is obtained. Results of experiments with greedy algorithm are discussed.

Mikhail Ju. Moshkov, Marcin Piliszczuk, Beata Zielosko

Extracting Relevant Information about Reduct Sets from Data Tables

Abstract

The direct searching for relevant reducts in the set of all reducts of a given data table can be often computationally infeasible, especially for large data tables. Hence, there is a need for developing efficient methods for extracting relevant information about reducts from data tables which could help us to perform efficiently the inducing process of the high quality data models such as rule based classifiers. Such relevant information could help, e.g., to reduce the dimensionality of the attribute set. We discuss methods for generating relevant information about reduct sets from information systems or decision tables. In particular, we consider a binary relation on attributes satisfied for two given attributes if and only if there is no reduct consisting them both. Moreover, we prove that for any fixed natural k, there exists a polynomial in time algorithm which for a given decision table T and given k conditional attributes recognizes if there exists a decision reduct of T covering these k attributes. We also present a list of problems related to the discussed issues. The reported results create a step toward construction of a software library reducing the searching costs for relevant reducts.

Mikhail Ju. Moshkov, Andrzej Skowron, Zbigniew Suraj

Context Algebras, Context Frames, and Their Discrete Duality

Abstract

The data structures dealt with in formal concept analysis are referred to as contexts. In this paper we study contexts within the framework of discrete duality. We show that contexts can be adequately represented by a class of sufficiency algebras called context algebras. On the logical side we define a class of context frames which are the semantic structures for context logic, a lattice-based logic associated with the class of context algebras. We prove a discrete duality between context algebras and context frames, and we develop a Hilbert style axiomatization of context logic and prove its completeness with respect to context frames. Then we prove a duality via truth theorem showing that both context algebras and context frames provide the adequate semantic structures for context logic. We discuss applications of context algebras and context logic to the specification and verification of various problems concerning contexts such as implications (attribute dependencies) in contexts, and derivation of implications from finite sets of implications.

Ewa Orłowska, Ingrid Rewitzky

A Study in Granular Computing: On Classifiers Induced from Granular Reflections of Data

Abstract

Granular Computing as a paradigm in the area of Approximate Reasoning/Soft Computing, goes back to the idea of L. A. Zadeh (1979) of computing with collections of similar entities. Both fuzzy and rough set theories are immanently occupied with granules as atomic units of knowledge are inverse images of fuzzy membership functions in the first and indiscernibility classes in the other set theory.

Research on granulation in the framework of rough set theory has started soon after Zadeh’s program manifest (T.Y. Lin, L.Polkowski, Qing Liu, A.Skowron, J.Stepaniuk, Y.Y.Yao) with various tools from general theory of binary relations (T.Y.Lin, Y.Y.Yao), rough mereology (L.Polkowski, A.Skowron), approximation spaces (A. Skowron and J. Stepaniuk), logics for approximate reasoning (L.Polkowski, M. Semeniuk-Polkowska, Qing Liu).

The program of granular computing requires that granules formed from entities described by data should enter computing process as elementary units of computation; this program has been pursued in some aspects of reasoning under uncertainty like fusion of knowledge, rough–neural computing, many agent systems.

In this work, granules of knowledge are exploited in tasks of classification of data. This research is a follow–up on the program initiated by the first author in plenary talks at IEEE International Conferences on Granular Computing in Beijing, 2005, and Atlanta, 2006. The idea of this program consists in granulating data and creating a granular data set (called the granular reflection of the original data set); due to expected in the process of granulation smoothing of data, eliminating of outliers, and averaging of attribute values, classification on the basis of granular data is expected to be of satisfactory quality, i.e., granulation should preserve information encoded in data to a satisfactory degre. It should be stressed, however, that the proposed process of building a granular structure involves a few random procedures (factoring attributes through a granule, selection of a granular covering of the universe of objects) which makes it difficult for a rigorous analysis.

It is the aim of this work to verify the program of granular classification on the basis of experiments with real data.

Granules of knowledge are in this work defined and computed on lines proposed by Polkowski in teh framework of rough mereology: it does involve usage of similarity measures called rough inclusions along with techniques of mereological theory of concepts. In consequence, definitions of granules are invariant with respect to the choice of the underlying similarity measure.

Granules of knowledge enter the realm of classification problems in this work from a three–fold perspective: first, granulated data sets give rise to new data sets on which classifiers are tested and the results are compared to results obtained with the same classifiers on the original data sets; next, granules of training objects as well as granules of rules obtained from the training set vote for value of decision at a test object; this is repeated with granules of granular reflections of granules and with granules of rules obtained from granulated data sets. Finally, the voting is augmented with weights resulting from the distribution of attribute values between the test object and training objects.

In the first case, the rough inclusion based on Hamming’s metric is applied (or, equivalently, it is the rough inclusion produced from the archimedean t–norm of Łukasiewicz); in the last two cases, rough inclusions are produced on the basis of residual implications induced from continuous t–norms of Łukasiewicz, the product t–norm, and the minimum t–norm, respectively.

In all cases results of experiments on chosen real data sets, most often used as a test data for rough set methods, are very satisfactory, and, in some cases, offer results better than many other rough set based classification methods.

Lech Polkowski, Piotr Artiemjew

On Classifying Mappings Induced by Granular Structures

Abstract

In this work the subject of granular computing is pursued beyond the content of the previous paper [21]. We study here voting on a decision by granules of training objects, granules of decision rules, granules of granular reflections of training data, and granules of decision rules induced from granular reflections of training data. This approach can be perceived as a direct mapping of the training data on test ones which is induced by granulation of knowledge on the training data. Some encouraging results were already presented in [21], and here the subject is pursued systematically.

Granules of knowledge are defined and computed according to a previously used scheme due to Polkowski in the framework of theory of rough inclusions.

On the basis of presented results, one is justified in concluding that the presented methods offer a very good quality of classification, comparable fully with best results obtained by other rough set based methods, like templates, adaptive methods, hybrid methods etc.

Lech Polkowski, Piotr Artiemjew

The Neurophysiological Bases of Cognitive Computation Using Rough Set Theory

Abstract

A popular view is that the brain works in a similar way to a digital computer or a Universal Turing Machine by processing symbols. Psychophysical experiments and our amazing capability to recognize complex objects (like faces) in different light and context conditions argue against symbolic representation and suggest that concept representation related to similarities may be a more appropriate model of brain function. In present work, by looking into anatomical and neurophysiological basis of how we classify objects shapes, we propose to describe computational properties of the brain by rough set theory (Pawlak, 1992 [1]). Concepts representing objects physical properties in variable environment are weak (not precise), but psychophysical space shows precise object categorizations. We estimate brain expertise in classifications of the object’s components by analyzing single cell responses in the area responsible for simple shape recognition ([2]). Our model is based on the receptive field properties of neurons in different visual areas: thalamus, V1 and V4 and on feedforward (FF) and feedback (FB) interactions between them. The FF pathways combine properties extracted in each area into a vast number of hypothetical objects by using “driver logical rules”, in contrast to “modulator logical rules” of the FB pathways. The FB pathways function may help to change weak concepts of objects physical properties into their crisp classification in psychophysical space.

Andrzej W. Przybyszewski

Diagnostic Feature Analysis of a Dobutamine Stress Echocardiography Dataset Using Rough Sets

Abstract

Stress echocardiography is an important functional diagnostic and prognostic tool that is now routinely applied to evaluate the risk of cardiovascular artery disease (CAD). In patients who are unable to safely undergo a stress based test, dobutamine is administered which provides a similar effect to stress on the cardiovascular system. In this work, a complete dataset containing data on 558 subjects undergoing a prospective longitudinal study is employed to investigate what diagnostic features correlate with the final outcome. The dataset was examined using rough sets, which produced a series of decision rules that predicts which features influence the outcomes measured clinically and recorded in the dataset. The results indicate that the ECG attribute was the most informative diagnostic feature. In addition, prehistory information has a significant impact on the classification accuracy.

Kenneth Revett

Rules and Apriori Algorithm in Non-deterministic Information Systems

Abstract

This paper presents a framework of rule generation in Non-deterministic Information Systems (NISs), which follows rough sets based rule generation in Deterministic Information Systems (DISs). Our previous work about NISs coped with certain rules, minimal certain rules and possible rules. These rules are characterized by the concept of consistency. This paper relates possible rules to rules by the criteria support and accuracy in NISs. On the basis of the information incompleteness in NISs, it is possible to define new criteria, i.e., minimum support, maximum support, minimum accuracy and maximum accuracy. Then, two strategies of rule generation are proposed based on these criteria. The first strategy is Lower Approximation strategy, which defines rule generation under the worst condition. The second strategy is Upper Approximation strategy, which defines rule generation under the best condition. To implement these strategies, we extend Apriori algorithm in DISs to Apriori algorithm in NISs. A prototype system is implemented, and this system is applied to some data sets with incomplete information.

Hiroshi Sakai, Ryuji Ishibashi, Kazuhiro Koba, Michinori Nakata

On Extension of Dependency and Consistency Degrees of Two Knowledges Represented by Covering

Abstract

Knowledge of an agent depends on the granulation procedure adopted by the agent. The knowledge granules may form a partition of the universe or a covering. In this paper dependency degrees of two knowledges have been considered in both the cases. A measure of consistency and inconsistency of knowledges are also discussed. This paper is a continuation of our earlier work [3].

P. Samanta, Mihir K. Chakraborty

A New Approach to Distributed Algorithms for Reduct Calculation

Abstract

Calculating reducts is a very important process. Unfortunately, the process of computing all reducts in NP-hard. There are a lot of heuristic solutions for computing reducts, but they do not guarantee achieving complete set of reducts. We propose here three versions of an exact algorithm, designed for parallel processing. We present here how to decompose the problem of calculating reducts, so that parallel calculations are efficient.

Tomasz Stra̧kowski, Henryk Rybiński

From Information System to Decision Support System

Abstract

In the paper we present the definition of Pawlak’s model of an information system. The model covers information systems with history, systems with the decomposition of objects or attributes and dynamical information systems. Information systems are closely related to rough set theory and decision support systems. The aim of the paper is to characterize the stimulated by Professor Pawlak research of the group in Silesian University in information retrieval based on different information systems and in decision support based on rough sets, and to outline the current research projects of this group on modern decision systems.

Alicja Wakulicz-Deja, Agnieszka Nowak

Debellor: A Data Mining Platform with Stream Architecture

Abstract

This paper introduces Debellor (www.debellor.org) – an open source extensible data mining platform with stream-based architecture, where all data transfers between elementary algorithms take the form of a stream of samples. Data streaming enables implementation of scalable algorithms, which can efficiently process large volumes of data, exceeding available memory. This is very important for data mining research and applications, since the most challenging data mining tasks involve voluminous data, either produced by a data source or generated at some intermediate stage of a complex data processing network.

Advantages of data streaming are illustrated by experiments with clustering time series. The experimental results show that even for moderate-size data sets streaming is indispensable for successful execution of algorithms, otherwise the algorithms run hundreds times slower or just crash due to memory shortage.

Stream architecture is particularly useful in such application domains as time series analysis, image recognition or mining data streams. It is also the only efficient architecture for implementation of online algorithms.

The algorithms currently available on Debellor platform include all classifiers from Rseslib and Weka libraries and all filters from Weka.

Marcin Wojnarski

Category-Based Inductive Reasoning: Rough Set Theoretic Approach

Abstract

The present paper is concerned with rough set theory (RST) and a particular approach to human-like induction, namely similarity coverage model (SCM). It redefines basic concepts of RST – such like e.g. a decision rule, accuracy and coverage of decision rules – in the light of SCM and explains how RST may be viewed as a similarity-based model of human-like inductive reasoning. Furthermore, following the knowledge-based theory of induction, we enrich RST by the concept of an ontology and, in consequence, we present an RST-driven conceptualisation of SCM. The paper also discusses a topological representation of information systems in terms of non-Archimedean structures. It allows us to present an ontology-driven interpretation of finite non-Archimedean nearness spaces and, to some extent, to complete recent papers about RST and the topological concepts of nearness.

Marcin Wolski

Probabilistic Dependencies in Linear Hierarchies of Decision Tables

Abstract

The article is a study of probabilistic dependencies between attribute-defined partitions of a universe in hierarchies of probabilistic decision tables. The dependencies are expressed through two measures: the probabilistic generalization of the Pawlak’s measure of the dependency between attributes and the expected certainty gain measure introduced by the author. The expected certainty gain measure reflects the subtle grades of probabilistic dependence of events. Both dependency measures are developed and it is shown how they can be extended from flat decision tables to dependencies existing in hierarchical structures of decision tables.

Wojciech Ziarko

Automatic Singing Voice Recognition Employing Neural Networks and Rough Sets

Abstract

The aim of the research study presented in this paper is the automatic recognition of a singing voice. For this purpose, a database containing sample recordings of trained and untrained singers was constructed. Based on these recordings, certain voice parameters were extracted. Two recognition categories were defined – one reflecting the skills of a singer (quality), and the other reflecting the type of the singing voice (type). The paper also presents the parameters designed especially for the analysis of a singing voice and gives their physical interpretation. Decision systems based on artificial neutral networks and rough sets are used for automatic voice quality/ type classification. Results obtained from both decision systems are then compared and conclusions are derived.

Paweł Żwan, Piotr Szczuko, Bożena Kostek, Andrzej Czyżewski

Hierarchical Classifiers for Complex Spatio-temporal Concepts

Abstract

The aim of the paper is to present rough set methods of constructing hierarchical classifiers for approximation of complex concepts. Classifiers are constructed on the basis of experimental data sets and domain knowledge that are mainly represented by concept ontology. Information systems, decision tables and decision rules are basic tools for modeling and constructing such classifiers. The general methodology presented here is applied to approximate spatial complex concepts and spatio-temporal complex concepts defined for (un)structured complex objects, to identify the behavioral patterns of complex objects, and to the automated behavior planning for such objects when the states of objects are represented by spatio-temporal concepts requiring approximation. We describe the results of computer experiments performed on real-life data sets from a vehicular traffic simulator and on medical data concerning the infant respiratory failure.

Jan G. Bazan

Springer Professional

Transactions on Rough Sets IX

Inhaltsverzeichnis

Frontmatter

Vagueness and Roughness

Modified Indiscernibility Relation in the Theory of Rough Sets with Real-Valued Attributes: Application to Recognition of Fraunhofer Diffraction Patterns

On Certain Rough Inclusion Functions

Automatic Rhythm Retrieval from Musical Files

FUN: Fast Discovery of Minimal Sets of Attributes Functionally Determining a Decision Attribute

Information Granulation: A Medical Case Study

Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation

Approximation Schemes in Logic and Artificial Intelligence

Decision Rule Based Data Models Using NetTRS System Overview

A Rough Set Based Approach for ECG Classification

Universal Problem of Attribute Reduction

Extracting Relevant Information about Reduct Sets from Data Tables

Context Algebras, Context Frames, and Their Discrete Duality

A Study in Granular Computing: On Classifiers Induced from Granular Reflections of Data

On Classifying Mappings Induced by Granular Structures

The Neurophysiological Bases of Cognitive Computation Using Rough Set Theory

Diagnostic Feature Analysis of a Dobutamine Stress Echocardiography Dataset Using Rough Sets

Rules and Apriori Algorithm in Non-deterministic Information Systems

On Extension of Dependency and Consistency Degrees of Two Knowledges Represented by Covering

A New Approach to Distributed Algorithms for Reduct Calculation

From Information System to Decision Support System

Debellor: A Data Mining Platform with Stream Architecture

Category-Based Inductive Reasoning: Rough Set Theoretic Approach

Probabilistic Dependencies in Linear Hierarchies of Decision Tables

Automatic Singing Voice Recognition Employing Neural Networks and Rough Sets

Hierarchical Classifiers for Complex Spatio-temporal Concepts

Backmatter

Premium Partner