Skip to main content
Top

2003 | Book

Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing

9th International Conference, RSFDGrC 2003, Chongqing, China, May 26–29, 2003 Proceedings

Editors: Guoyin Wang, Qing Liu, Yiyu Yao, Andrzej Skowron

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This volume contains the papers selected for presentation at the 9th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFDGrC 2003) held at Chongqing University of Posts and Telecommunications, Chongqing, P.R. China, May 26–29, 2003. There were 245 submissions for RSFDGrC 2003 excluding for 2 invited keynote papers and 11 invited plenary papers. Apart from the 13 invited papers, 114 papers were accepted for RSFDGrC 2003 and were included in this volume. The acceptance rate was only 46.5%. These papers were divided into 39 regular oral presentation papers (each allotted 8 pages), 47 short oral presentation papers (each allotted 4 pages) and 28 poster presentation papers (each allotted 4 pages) on the basis of reviewer evaluations. Each paper was reviewed by three referees. The conference is a continuation and expansion of the International Workshops on Rough Set Theory and Applications. In particular, this was the ninth meeting in the series and the first international conference. The aim of RSFDGrC2003 was to bring together researchers from diverse fields of expertise in order to facilitate mutual understanding and cooperation and to help in cooperative work aimed at new hybrid paradigms. It is our great pleasure to dedicate this volume to Prof. Zdzislaw Pawlak, who first introduced the basic ideas and definitions of rough sets theory over 20 years ago.

Table of Contents

Frontmatter

Keynote Papers

Flow Graphs and Decision Algorithms

In this paper we introduce a new kind of flow networks, called flow graphs, different to that proposed by Ford and Fulkerson. Flow graphs are meant to be used as a mathematical tool to analysis of information flow in decision algorithms, in contrast to material flow optimization considered in classical flow network analysis. In the proposed approach branches of the flow graph are interpreted as decision rules, while the whole flow graph can be understood as a representation of decision algorithm. The information flow in flow graphs is governed by Bayes’ rule, however, in our case, the rule does not have probabilistic meaning and is entirely deterministic. It describes simply information flow distribution in flow graphs. This property can be used to draw conclusions from data, without referring to its probabilistic structure.

Zdzisław Pawlak
The Quotient Space Theory of Problem Solving

The talk introduces a framework of quotient space theory of problem solving. In the theory, a problem (or problem space) is represented as a triplet, including the universe, its structure and attributes. The worlds with different grain size are represented by a set of quotient spaces. The basic characteristics of different grain-size worlds are presented. Based on the model, the computational complexity of hierarchical problem solving is discussed.

Ling Zhang, Bo Zhang

Plenary Papers

Granular Computing
Structures, Representations, and Applications

The structure and representation theories of (crisp/fuzzy) granulations are presented. The results are applied to data mining, fuzzy control, security and etc.

Tsau Young Lin
Rough Sets: Trends and Challenges
Extended Abstract

We discuss how approximation spaces considered in the context of rough sets and information granule theory have evolved over the last 20 years from simple approximation spaces to more complex spaces. Some research trends and challenges for the rough set approach are outlined in this paper. The study of the evolution of approximation space theory and applications is considered in the context of rough sets introduced by Zdzisław Pawlak and the notions of information granulation and computing with words formulated by Lotfi Zadeh. The deepening of our understanding of information granulation and the introduction to new approaches to concept approximation, pattern identification, pattern recognition, pattern languages, clustering, information granule systems, and inductive reasoning have been aided by the introduction of a calculus of information granules based on rough mereology. Central to rough mereology is the inclusion relation to be a part to a degree. This calculus has grown out of an extension of what S. Leśniewski called mereology (the study of what it means to be a part of).

Andrzej Skowron, James F. Peters
A New Development on ANN in China — Biomimetic Pattern Recognition and Multi Weight Vector Neurons

A new model of pattern recognition principles—Biomimetic Pattern Recognition, which is based on “matter cognition” instead of “matter Classification”, has been proposed. As a important means realizing Biomimetic Pattern Recognition, the mathematical model and analyzing method of ANN get breakthrough: a novel all-purpose mathematical model has been advanced, which can simulate all kinds of neuron architecture, including RBF and BP models. As the same time this model has been realized using hardware; the high-dimension space geometry method, a new means to analyzing ANN, has been researched.

Shoujue Wang
On Generalizing Rough Set Theory

This paper summarizes various formulations of the standard rough set theory. It demonstrates how those formulations can be adopted to develop different generalized rough set theories. The relationships between rough set theory and other theories are discussed.

Y. Y. Yao
Dual Mathematical Models Based on Rough Approximations in Data Analysis

In rough set approach, the rough approximations called lower and upper ones have been discussed. This concept can be extended into a new research field of data analysis. The proposed approach to data modeling is to obtain dual mathematical models by using a similar concept to rough sets. The dual models called lower and upper models have an inclusion relation. In the other words, the proposed method can be described as two approximations to a phenomenon under consideration such that $$ Lower Model \subseteq Phenomenon \subseteq Upper Model. $$Thus, the lower and upper models are obtained by the greatest lower bound and the least upper bound, respectively. This property is illustrated by interval regression models which are not crisp, but have an interval relationship between inputs and outputs. Generally, the lower and upper models are formulated by greatest lower and least upper bounds, respectively. The given phenomenon can be expressed by the pair (lower model, upper model) corresponding to rough approximations.

Hideo Tanaka
Knowledge Theory: Outline and Impact

Knowledge is widely regarded as the most valuable wealth but there has not been a systematic theory of knowledge existed yet till the present time. An attempt is thus made in the paper to present the results of a preliminary study of Knowledge Theory. The results reported here contains three parts: fundamentals of the theory, the main body of the theory — the mechanism of knowledge creation from information and the mechanism of intelligence creation from knowledge, and the impact that Knowledge Theory may have.

Y. X. Zhong
A Rough Set Paradigm for Unifying Rough Set Theory and Fuzzy Set Theory

In this plenary address, we would like to discuss rough inclusions defined in Rough Mereology, a joint idea with A. Skowron, as a basis for common models for rough as well as fuzzy set theories. We would like to justify the point of view that tolerance (or, similarity) is the leading motif common to both theories and in this area paths between the two lie.

Lech Polkowski
Extracting Structure of Medical Diagnosis: Rough Set Approach

One of the most important problems on rule induction methods is that they cannot extract rules, which plausibly represent experts’ decision processes. It is because rule induction methods induce probabilistic rules that discriminates between a target concept and other concepts, assuming that all the concepts are on the same level. However, medical experts assume that all the concepts of diseases are belonging to the different level of hierarchy. In this paper, the characteristics of experts’ rules are closely examined and a new approach to extract plausible rules is introduced, which consists of the following three procedures. First, the characterization of decision attributes (given classes) is extracted from databases and the concept hierarchy for given classes is calculated. Second, based on the hierarchy, rules for each hierarchical level are induced from data. Then, for each given class, rules for all the hierarchical levels are integrated into one rule. The proposed method was evaluated on a medical database, the experimental results of which show that induced rules correctly represent experts’ decision processes.

Shusaku Tsumoto
A Kind of Linearization Method in Fuzzy Control System Modeling

A kind of linearization method in fuzzy control system modeling is proposed, in order to deal with the nonlinear model with variable coefficients. The method can turn a nonlinear model with variable coefficients into a linear model with variable coefficients in the way that the membership functions of the fuzzy sets in fuzzy partitions of the universes are changed from triangle waves into rectangle waves. However, the linearization models are incomplete in their forms because of their lacking some items. For solving this problem, joint approximation by using linear models is introduced. The simulation results show that marginal linearization models are of higher approximation precision than their original nonlinear models.

Hongxing Li, Jiayin Wang, Zhihong Miao
A Common Framework for Rough Sets, Databases, and Bayesian Networks

It has been pointed out that there exists an intriguing relationship between propositional modal logic and rough sets [8, 2]. In this paper, we use first order modal logic (FOML) to formulate a common framework for rough sets, databases, and Bayesian networks. The relational view of the semantics of first order modal logic provides a unified interpretation of many related concepts.

S. K. M. Wong, D. Wu
Rough Sets, EM Algorithm, MST and Multispectral Image Segmentation

Segmentation is a process of partitioning an image space into some nonoverlapping meaningful homogeneous regions. The success of an image analysis system depends on the quality of segmentation. Two broad approaches to segmentation of remotely sensed images are gray level thresholding and pixel classification [1]. Multispectral nature of most remote sensing images make pixel classification the natural choice for segmentation.

Sankar K. Pal, Pabitra Mitra

Rough Sets Foundations and Methods

Rough Mereology: A Survey of New Developments with Applications to Granular Computing, Spatial Reasoning and Computing with Words

In this paper, we present a survey of new developments in rough mereology, i.e. approximate calculus of parts, an approach to reasoning under uncertainty based on the notion of an approximate part (part to a degree) along with pointers to its main applications. The paradigms of Granule Computing (GC), Computing with Words (CWW) and Spatial Reasoning (SR) are particularly suited to a unified treatment by means of Rough Mereology (RM).

Lech Polkowski
A New Rough Sets Model Based on Database Systems

In this paper we present a new rough sets model based on database systems. We borrow the main ideas of the original rough sets theory and redefine them based on the database theory to take advantage of the very efficient set-oriented database operation. We present a new set of algorithms to calculate core, reduct based on our new database based rough set model. Almost all the operations used in generating core, reduct in our model can be performed using the database set operations such as Count, Projection. Our new rough sets model is designed based on database set operations, compared with the traditional rough set models, ours is very efficient and scalable.

Xiaohua Tony Hu, T. Y. Lin, Jianchao Han
A Rough Set and Rule Tree Based Incremental Knowledge Acquisition Algorithm

As a special way of human brains in learning new knowledge, incremental learning is an important topic in AI. It is an object of many AI researchers to find an algorithm that can learn new knowledge quickly based on original knowledge learned before and the knowledge it acquires is efficient in real use. In this paper, we develop a rough set and rule tree based incremental knowledge acquisition algorithm. It can learn from a domain data set incrementally. Our simulation results show that our algorithm can learn more quickly than classical rough set based knowledge acquisition algorithms, and the performance of knowledge learned by our algorithm can be the same as or even better than classical rough set based knowledge acquisition algorithms. Besides, the simulation results also show that our algorithm outperforms ID4 in many aspects.

Zheng Zheng, Guoyin Wang, Yu Wu
Comparison of Conventional and Rough K-Means Clustering

This paper compares the results of clustering obtained using a modified K-means algorithm with the conventional clustering process. The modifications to the K-means algorithm are based on the properties of rough sets. The resulting clusters are represented as interval sets. The paper describes results of experiments used to create conventional and interval set representations of clusters of web users on three educational web sites. The experiments use secondary data consisting of access logs from the World Wide Web. This type of analysis is called web usage mining, which involves applications of data mining techniques to discover usage patterns from the web data. Analysis shows the advantages of the interval set representation of clusters over conventional crisp clusters.

Pawan Lingras, Rui Yan, Chad West
An Application of Rough Sets to Monk’s Problems Solving

In this paper, the main techniques of inductive machine learning are united to the knowledge reduction theory based on rough sets from the theoretical point of view. And then the Monk’s problems are resolved again employing rough sets. As far as accuracy and conciseness are concerned, the learning algorithms based on rough sets have remarkable superiority to the previous methods.

Duoqian Miao, Lishan Hou
Pre-topologies and Dynamic Spaces

Approximation Spaces were introduced in order to analyse data on the basis of Indiscernibility Spaces, that is, spaces of the form 〈U, E〉, where U is the universe of data and E is an equivalence relation on U. Various authors suggested considering spaces of the form 〈U, R〉, where R is any relation. This paper aims at introducing a further step consisting in spaces of the form 〈U, {R}i∈I〉, where {R}i∈I〉 is a family of relations on U, that we call “Dynamic Spaces”, because they make it possible to account for different forms of dynamics. While Indiscernibility Spaces induce 0-dimensional topological spaces (Approximation Spaces), Dynamic Spacess induce various types of pre-topological spaces.

Piero Pagliani
Rough Sets and Gradual Decision Rules

We propose a new fuzzy rough set approach which, differently from all known fuzzy set extensions of rough set theory, does not use any fuzzy logical connectives (t-norm, t-conorm, fuzzy implication). As there is no rationale for a particular choice of these connectives, avoiding this choice permits to reduce the part of arbitrary in the fuzzy rough approximation. Another advantage of the new approach is that it is based on the ordinal property of fuzzy membership degrees only. The concepts of fuzzy lower and upper approximations are thus proposed, creating a base for induction of fuzzy decision rules having syntax and semantics of gradual rules.

Salvatore Greco, Masahiro Inuiguchi, Roman Słowiński
Explanation Oriented Association Mining Using Rough Set Theory

This paper presents a new philosophical view and methodology for data mining. A framework of explanation oriented data mining is proposed and studied with respect to association mining. The notion of conditional associations is adopted, which explicitly expresses the conditions under which an association occurs. To illustrate the basic ideas, the theory of rough sets is used to construct explanations.

Y. Y. Yao, Y. Zhao, R. B. Maguire
Probabilistic Rough Sets Characterized by Fuzzy Sets

In this paper, fuzziness in probabilistic rough set is studied by fuzzy sets. we show that the variable precision approximation of a probabilistic rough set can be generalized from the vantage point of the cuts of a fuzzy set which is determined by the rough membership function. As a result, the fuzzy set can be used conveniently to describe the feature of rough set. Moreover we give a measure of fuzziness, fuzzy entropy, induced by roughness in a probabilistic rough set and make some characterizations of this measure. For three well-known entropy functions, we show that the finer the information granulation is, the less the fuzziness in a rough set. The superiority of fuzzy entropy to Pawlak’s accuracy measure is illustrated with examples. Finally, the fuzzy entropy of a rough classification is defined by the fuzzy entropy of corresponding rough sets, and show that one possible application of it is to measure the inconsistency in a decision table.

Li-Li Wei, Wen-Xiu Zhang
A View on Rough Set Concept Approximations

The concept of approximation is one of the most fundamental in rough set theory. In this work we examine this basic notion as well as its extensions and modifications. The goal is to construct a parameterised approximation mechanism making it possible to develop multi-stage multi-level concept hierarchies that are capable of maintaining acceptable level of imprecision from input to output.

Jan Bazan, Nguyen Hung Son, Andrzej Skowron, Marcin Szczuka
Evaluation of Probabilistic Decision Tables

The article presents the basic notions of the variable precision rough set model (VPRSM). The main subject of the article is the evaluation of VPRSM set approximations and corresponding probabilistic decision tables using a number of proposed probabilistic measures.

Wojciech Ziarko
Query Answering in Rough Knowledge Bases

We propose a logic programming language which makes it possible to define and to reason about rough sets. In particular we show how to test for rough inclusion and rough equality. This extension to our previous work [7] is motivated by the need of these concepts in practical applications.

Aida Vitória, Carlos Viegas Damásio, Jan Małuszyński
Upper and Lower Recursion Schemes in Abstract Approximation Spaces

An approximation space (U, R) placed in a type-lowering retraction with 2U × U provides a model for a first order calculus of relations for computing over lists and reasoning about the resulting programs. Upper and lower approximations to the scheme of primitive recursion of the Theory of Pairs are derived from the approximation operators of an abstract approximation space (U, ⋄ : u ↦ ⋃[u]R, □ : u ↦ ⋂[u]R).

Peter Apostoli, Akira Kanda
Adaptive Granular Control of an HVDC System: A Rough Set Approach

This article reports the results of a three-year study of adaptive granular control of High-Voltage Direct Current (HVDC) systems using a combination of rough sets and granular computing techniques. A proportional integral (PI) control strategy is commonly used for constant current and extinction angle control in an HVDC system. A PI control strategy is based on a static design where the gains of a PI controller are fixed. Since the response of a HVDC plant dynamically changes with variations in the operating point, a PI controller’s performance is far from optimal. By contrast, an adaptive controller makes changes in the gains relative to the observed changes in HVDC system behavior. However, adaptive controllers require for their design, a frequency domain model of the controlled plant. Due to the non-linear operation of the HVDC system, such a model is difficult to establish. Because rough set theory makes it possible to set up a decision-making utility that approximates a control engineer’s knowledge about how to tune the controller of a system to improve its behavior, rough sets can be used to design an adaptive controller for the HVDC system. The contribution of this paper is the presentation of the design of a rough set based, granular control scheme. Experimental results that compare the performance of the adaptive control and PI control schemes are also given.

J. F. Peters, H. Feng, S. Ramanna
Rough Set Approach to Domain Knowledge Approximation

Classification systems working on large feature spaces, despite extensive learning, often perform poorly on a group of atypical samples. The problem can be dealt with by incorporating domain knowledge about samples being recognized into the learning process. We present a method that allows to perform this task using a rough approximation framework. We show how human expert’s domain knowledge expressed in natural language can be approximately translated by a machine learning recognition system. We present in details how the method performs on a system recognizing handwritten digits from a large digit database. Our approach is an extension of ideas developed in the rough mereology theory.

Tuan Trung Nguyen, Andrzej Skowron
Reasoning Based on Information Changes in Information Maps

We discuss basic concepts for approximate reasoning about information changes. Any rule for reasoning about information changes specifies how changes of information granules from the rule premise influence changes of information granules from the rule conclusion. Changes in information granules can be measured, e.g., using expressions analogous to derivatives. We illustrate our approach by means of information maps and information granules defined in such maps.

Andrzej Skowron, Piotr Synak
Characteristics of Accuracy and Coverage in Rule Induction

Rough set analysis are closely related with accuracy and coverage. However, there have been few studies on the formal characteristics of accuracy and coverage for rule induction have never been discussed until Tsumoto showed several characteristics of accuracy and coverage. In this paper, the following characteristics of accuracy and coverage are further investigated: (1) The higher the accuracy of the conjunctive formula become, the lower the effect on the conjunction will become. (2) Coverage will decrease more rapidly than accuracy. (3) The change of coverage becomes very small when the length of the conjunctive formula becomes larger. (4) The discussions above are corresponding to those on sensitivity and specificity. (5) When we focus on accurate classification, the classification efficiency, which is the product of sensitivity and specificity will become lower.

Shusaku Tsumoto
Interpretation of Rough Neural Networks as Emergent Model

The need for more effective methods to generate and maintain global nonfunctional properties suggests an approach analogous to those of natural processes in generating emergent properties. Emergent model allows the constraints of the task to be represented more naturally and permits only pertinent task specific knowledge to emerge in the course of solving the problem. The paper describes some basics of emergent phenomena and its implementation in the rough hybrid systems.

Yasser Hassan, Eiichiro Tazaki
Using Fuzzy Dependency-Guided Attribute Grouping in Feature Selection

Feature selection has become a vital step in many machine learning techniques due to their inability to handle high dimensional descriptions of input features. This paper demonstrates the applicability of fuzzy-rough attribute reduction and fuzzy dependencies to the problem of learning classifiers, resulting in simpler rules with little loss in classification accuracy.

Richard Jensen, Qiang Shen
Conjugate Information Systems: Learning Cognitive Concepts in Rough Set Theory

The problem of concept learning [1] via rough sets has been recently discussed in literature e.g. cf. [6], [3], [7]. To formally study this problem, we introduce a notion of a conjugate information system cf. [3].

Maria Semeniuk-Polkowska, Lech Polkowski
A Rule Induction Method of Plant Disease Description Based on Rough Sets

Knowledge acquisition is the bottleneck to develop expert system. It usually takes a long period to acquire plant disease knowledge using the traditional methods. Aiming at this problem, this paper describes relations between rough set theory and rule-based description of plant diseases. Then the exclusive rules, inclusive rules and disease images of rapeseed disease are built based on the PDES diagnosis model, and the definition of probability rule is put forward. At last, the paper presents the rule-based automated induction reasoning method, including exhaustive search, post-processing procedure, estimation for statistic test and the bootstrap and resampling methods. We also introduce automated induction of the rule-based description, which is used in our plant diseases diagnostic expert system. The experimental results show that this method induces diagnostic rules correctly.

Ai-Ping Li, Gui-Ping Liao, Quan-Yuan Wu
Rough Set Data Analysis Algorithms for Incomplete Information Systems

The rough set theory is a relatively new soft computing tool for dealing with vagueness and uncertainty in databases. To apply this theory, it is important to associate it with effective computational methods. In this paper, we focus on the development of algorithms for incomplete information systems and their time and space complexity. In particular, by using measure of significance of attribute which is defined by us, we present a heuristic algorithm for computing the minimal reduct, the time complexity of this algorithm is O(|A|3|U|2), and its space complexity is O(|A||U|). The minimal reduct algorithm is very useful for knowledge discovery in databases.

K. S. Chin, Jiye Liang, Chuangyin Dang
Inconsistency Classification and Discernibility-Matrix-Based Approaches for Computing an Attribute Core

In this paper, we firstly introduce a concept of inconsistency classification based on which we draw a qualitative conclusion that the approach by Hu and Cercone for computing an attribute core based on Skowron’s discernibility matrix is correct for both consistent and partially inconsistent decision tables, but may fail to work for entirely inconsistent ones. Secondly, we improve the work of Zhi and Miao concerning the computation of core attributes by defining a new binary discernibility matrix. Finally, as another application of inconsistency classification, we show that an attribute core from the algebra view is equivalent to that from the information view not only for consistent but also for partially inconsistent decision tables.

Dongyi Ye, Zhaojiong Chen
Multi-knowledge Extraction and Application

Rough set theory provides approaches to the finding a reduct (informally, an identifying set of attributes) from a decision system or a training set. In this paper, an algorithm for finding multiple reducts is developed. The algorithm has been used to find the multi-reducts in data sets from UCI Machine Learning Repository. The experiments show that many databases in the real world have multiple reducts. Using the multi-reducts, multi-knowledge is defined and an approach for extraction is presented. It is shown that a robot with multi-knowledge has the ability to identify a changing environment. Multi-knowledge can be applied in many application areas in machine learning or data mining domain.

QingXiang Wu, David Bell
Multi-rough Sets Based on Multi-contexts of Attributes

Rough set deals with crisp granularity of objects given a data table called information system as a pair I = (U, A), where U is a universal set of objects and A is a non-empty finite set of attributes. We may consider A as a set of contexts of attributes, where A i ∈ A is a set of attributes regarded as a context or background. Consequently, if there are n contexts in A, where A = {A1,..., A n }, it provides n partitions. A given set of object, $$ X \subseteq U $$ , may then be represented into n pairs of lower and upper approximations denoted as multi-rough sets of X. Some properties and operations are proposed and examined.

Rolly Intan, Masao Mukaidono
Approaches to Approximation Reducts in Inconsistent Decision Tables

In this paper, two new concepts of lower approximation reduction and upper approximation reduction are introduced. Lower approximation reduction is the smallest attribute subset that preserves the lower approximations of all decision classes, and upper approximation reduction is the smallest attribute subset that preserves the upper approximations of all decision classes. For an inconsistent DT, an upper approximation consistent set must be a lower approximation consistent set, but the converse is not true. For a consistent DT, they are equivalent. After giving their equivalence definitions, we examine the judgement theorem and discernibility matrices associated with the two reducts, from which we can obtain approaches to knowledge reduction in inconsistent decision tables.

Ju-Sheng Mi, Wei-Zhi Wu, Wen-Xiu Zhang
Degree of Dependency and Quality of Classification in the Extended Variable Precision Rough Sets Model

In this paper an investigation on the utilisation of the degree of dependency and quality of classification measures in the extended variable precision rough sets model is undertaken. The use of (l, u)-graphs enable these measures to aid in the classification of objects to a number of categories for a choice of l and u values and selection of a (l, u)-reduct.

Malcolm J. Beynon
Approximate Reducts of an Information System

Rough set is a tool for data mining. From an information system, we find reducts to generate decision rules for classification. However, for an information system has some noises, reducts may become meaningless or not appropriate for classification.In this paper, we propose some indices to find approximate reducts. For finding indeices to measure subsets of attributes, we introduce the contingency matrix based on the number of objects in each class of the information system. The main advantage of using the contingency matrix is that there are some good properties for finding approximate reducts.

Tien-Fang Kuo, Yasutoshi Yajima
A Rough Set Methodology to Support Learner Self-Assessment in Web-Based Distance Education

With the prevalence and explosive growth of distance education via the World Wide Web, many efforts are dedicated to make distance education more effective. We present a Rough Set model to provide an instrument for learner self-assessment when taking courses delivered via the World Wide Web. The Rough Set Based Inductive Learning Algorithm generates definite and probabilistic(general) rules, which are used to provide feedback to learners.

Hongyan Geng, Brien Maguire
A Synthesis of Concurrent Systems: A Rough Set Approach

The synthesis problem has been discussed in the literature for various types of formalisms. Our approach is based on the rough set theory and Petri nets. In the paper information systems are used for representing knowledge about the modeled concurrent system. As a model for concurrency coloured Petri nets are chosen. This paper provides an algorithm for constructing a model of a given concurrent system in the form of a net. The net construction consists of two stages. In the first stage, all dependencies represented by means the minimal rules between local states of processes in the system are extracted. In the second stage, a coloured Petri net corresponding to these dependencies is built. Our approach uses a new method for generating the minimal rules in order to solve the synthesis problem considered here. It can be used to deal with various problems arising from the design process of automated systems. The method proposed in the paper has been implemented in the ROSECON system running on IBM PC computers under Windows operating system. This system is permanently evolved. In the paper we assume that the reader is familiarized with the basic notions and notation of rough set theory as well as coloured Petri nets.

Zbigniew Suraj, Krzysztof Pancerz
Towards a Line-Crawling Robot Obstacle Classification System: A Rough Set Approach

The basic contribution of this paper is the presentation of two methods that can be used to design a practical robot obstacle classification system based on data mining methods from rough set theory. These methods incorporate recent advances in rough set theory related to coping with the uncertainty in making obstacle classification decisions either during the operation of a mobile robot. Obstacle classification is based on the evaluation of data acquired by proximity sensors connected to a line-crawling robot useful in inspecting power transmission lines. A fairly large proximity sensor data set has been used as means of benchmarking the proposed classification methods, and also to facilitate comparison with other published studies of the same data set. Using 10-fold cross validated paired t-test, this paper compares the rough set classification learning method with the Waikato Environment for Knowledge Analysis (WEKA) classification learning method.

James F. Peters, Sheela Ramanna, Marcin S. Szczuka
Order Based Genetic Algorithms for the Search of Approximate Entropy Reducts

We use entropy to extend the rough set based notion of a reduct. We show that the order based genetic algorithms, applied to the search of classical decision reducts, can be used in exactly the same way in case of extracting optimal approximate entropy reducts from data.

Dominik Ślęzak, Jakub Wróblewski
Variable Precision Bayesian Rough Set Model

We present a parametric extension of the Bayesian Rough Set (BRS) model. Its properties are investigated in relation to non-parametric BRS, classical Rough Set (RS) model and the Variable Precision Rough Set (VPRS) model.

Dominik Ślęzak, Wojciech Ziarko
Linear Independence in Contingency Table

A contingency table summarizes the conditional frequencies of two attributes and shows how these two attributes are dependent on each other. Thus, this table is a fundamental tool for pattern discovery with conditional probabilities, such as rule discovery. In this paper, a contingency table is interpreted from the viewpoint of granular computing. The first important observation is that contingency tables compare two attributes with respect to granularity, which means that a n×n table compares two attributes with the same granularity, while a m×n(m ≥ n) table can be viewed as the projection from m-partitions to n partition. The second important observation is that matrix algebra is a key point of analysis of this table. Especially, the degree of independence, rank plays a very important role in extracting a probabilistic model from a given contingency table.

Shusaku Tsumoto
The Information Entropy of Rough Relational Databases

Beaubouef, Petry and Buckles proposed the generalized rough set database analysis (GRSDA) to discuss rough relational databases. Given any rough relational database (U, A) and an attribute a ∈ A, as in rough set theory, a definition of the lower and upper approximations based on φ, a is given. The entropy and conditional entropy of similarity relations in a rough relational database are defined. The examples show that the entropy of a similarity relation does not decrease as the similarity relation is refined. It will be proved that given any two similarity relations φ and ψ, defined by a set C of conditional attributes and a decision attribute d, respectively, if d similarly depends on C in a rough relational database then the conditional entropy of φ with respect to ψ is equal to the entropy of φ.

Yuefei Sui, Youming Xia, Ju Wang
A T-S Type of Rough Fuzzy Control System and Its Implementation

A new type of rough fuzzy controller and its design method are presented and show how the rough logic is combined with fuzzy inference. In this approach, rough set theory is used to derive the minimal set of rules from input output data, and by complementing the information of output control corresponding to the rough reduced rules, a T-S type of rough fuzzy control system is constructed, which can solve the problem that the number of rules in a fuzzy controller increases exponentially with the number of variables involved.

Jinjie Huang, Shiyong Li, Chuntao Man
Rough Mereology in Knowledge Representation

The rough mereology proposed by Polkowski and Skowron is used in complex systems, multi-agent systems, knowledge engineering and knowledge representation. The function μ makes the rough mereology more like the fuzzy mereology. A new rough mereology is proposed, in which the rough inclusion is defined completely based on the upper and lower approximations of rough sets. The basic properties of the rough mereology, and applications in the knowledge representation and knowledge engineering are discussed.

Cungen Cao, Yuefei Sui, Zaiyue Zhang
Rough Set Methods for Constructing Support Vector Machines

Analyzed the generalities and specialties of Rough Sets Theory (RST) and Support Vector Machines (SVM) in knowledge representation and process of regression, a minimum decision network combining RST with SVM in intelligence processing is investigated, and a kind of SVM information process system on RST is proposed for forecasting. Using RST on the advantage of dealing with great data and eliminating redundant information, the system reduced the training data of SVM, and overcame the disadvantage of great data and slow training speed. The experimental results proved that the presented approach could achieve greater forecasting accuracy and generalization ability than the BP neural network and standard SVM.

Yuancheng Li, Tingjian Fang
The Lattice Property of Fuzzy Rough Sets

In this paper we modify the result in [2], discuss the lattice property of fuzzy rough sets and introduce lattice isomorphism from the equivalence class of fuzzy rough sets to intuitionistic fuzzy sets. The result of this paper extends the corresponding conclusion in [2] and will be applied in the area of computer science.

Fenglan Xiong, Xiangqian Ding, Yuhai Liu
Querying Data from RRDB Based on Rough Sets Theory

Based on RRDM(Rough Relational Database Model), the querying theory of RRDB(Rough Relational Database) is analyzed from decomposition principle, projection principle and the definability of RRDB. We divide rough data querying into three types: crisp querying, rough complete querying and rough combinatorial querying. In addition, we discuss the rough data querying from the three aspects and do some computational simulation to obtain the results that are good agreement with the conclusion in this paper.

Qiusheng An, Guoyin Wang, Junyi Shen, Jiucheng Xu
An Inference Approach Based on Rough Sets

In this paper we present an inference approach, which is based on rough sets theory, for inducing rules from examples. The features of the inference approach lies in that it combines a criterion of dependency degree with decision makers’ priori knowledge in selecting attributes of objects. For each rule can have several reductions, it uses an algorithm to implement a proper reduction and select the most effective attribute subset. As such, a practical and effective reduced knowledge rule set can be obtained.

Fuyan Liu, Shaoyi Lu
Classification Using the Variable Precision Rough Set

In this paper, we present a new version of discernibility matrix, so called variable precision discernibility matrix, which can tolerate the noise of information, in addition, a reduction algorithm is also presented based on the partial order relation of the conditional attribute and used to do image classification. Compared with traditional rough set and BP network, the results will be much better.

Yongqiang Zhao, Hongcai Zhang, Quan Pan
An Illustration of the Effect of Continuous Valued Discretisation in Data Analysis Using VPRSβ

This paper explores the effect of the results from the continuous valued discretisation (CVD) of condition attributes in an analysis using the variable precision rough sets model (VPRSβ). With the utilisation of an ordered criteria for the identification of an associated β-reduct, a ‘leave n out’ approach is undertaken in the subsequent analysis. A small example problem is considered for which three alternative methods of CVD are used to intervalise the continuous condition attributes. For each CVD approach used 1500 decision tables are constructed and VPRSβ analysis undertaken.

Malcolm J. Beynon

Fuzzy Sets and Systems

Application of Fuzzy Control Base on Changeable Universe to Superheated Steam Temperature Control System

The cascade control systems are generally adopted in the units of power plants, same to the application showed in this paper. The main controller employs the FLC based on the changeable universe of discourse that is designed in the paper. According to the inputs of the main FLC, the paper introduces a fuzzy variable α to turn automatically the range of universe of discourse and simulates the man’s fuzzy control process from rough to exact control process. The improved fuzzy logic controller has some self-adaptive adjusting capability. It is combined with PI controller for the cascade control system and applied to superheated temperature control system of some 1900t/h once-through boiler. The comparison of this scheme with traditional cascade PID control and conventional FLC demonstrates the efficacy of this scheme.

Keming Xie, Fang Wang, Gang Xie, T. Y. Lin
Application of Fuzzy Support Vector Machines in Short-Term Load Forecasting

A new method using Fuzzy Support Vector Machines (FSVM) is presented for Short-Term Load Forecasting (STLF). In many regression problems, the effects of the training points are different. It is often that some training points are more important than others. In FSVM, we apply a fuzzy membership to each input point such that different input points can make different contributions to the learning of decision surface. The results of experiment indicate that FSVM is effective in improving the accuracy of STLF.

Yuancheng Li, Tingjian Fang
A Symbolic Approximate Reasoning

We study knowledge-based systems using symbolic many-valued logic. In previous papers we have proposed a symbolic representation of nuanced statements. Firstly, we have introduced a symbolic concept whose role is similar to the role of the membership function within a fuzzy context. Using this concept, we have defined linguistic modifiers. In this paper, we propose new deduction rules dealing with nuanced statements. More precisely, we present new Generalized Modus Ponens rules within a many-valued context.

Mazen El-Sayed, Daniel Pacholczyk
Intuition in Soft Decision Analysis

The notions of fuzzy sets and intuitionistic fuzzy sets(IFS’s) play significant roles in case of some knowledge representation problems where mathematical complexity and practical inconvenience can be caused due to our inability to differentiate events exactly in real situations and thus to define instrumental notions in precise form. This paper discusses the notions of fuzzy sets, IFS’s and describes that how these tools can be used for the explanation of uncertainty as well as for modelling intuitionistic behavioural patterns occurring in case of decision analysis and systems designing problems. ...

Kankana Chakrabarty
Ammunition Supply Decision-Making System Design Based on Fuzzy Control

This paper puts forward radical thoughts and goal of designing the assistant decision-making system based on FUZZY control for ammunition supply and the whole structural function of the system by applying radical theory and method of FUZZY control which it establishes foundation of developing the assistant decision-making system based on FUZZY control.

Deyong Zhao, Xinfeng Wang, Jianguo Liu
The Concept of Approximation Based on Fuzzy Dominance Relation in Decision-Making

The method for decision-making system that derives from the combination of rough sets and fuzzy sets has been presented, discussing a multicriteria classification that differs from usual classification problems, the main change is the substitution of the indiscernibility relation by a dominance relation. Proposing the idea of fuzzy dominance relation for decision-making, and the concept of L-fuzzy set is given. It builds up a rough approximation of upward and downward cumulated sets by fuzzy dominance relation.

Yunxiang Liu, Jigui Sun, Shengsheng Wang
An Image Enhancement Arithmetic Research Based on Fuzzy Set and Histogram

This paper focuses on discussing a Fuzzy Enhancement Arithmetic Based on Histogram for image enhancement using fuzzy theory and histogram. With the statistic characters of histogram and flexibility of fuzzy set, this arithmetic can enhance the effect of some interested image-area freely, compared with some enhancement arithmetic using histogram alone or using selected threshold value difficultly. The paper proposes the corresponding theories and detailed process to carry out the arithmetic, and analyzes how to select the involved parameters xc, r and Fe in it. After that, the paper also gives an example for the application of the arithmetic.

Liang Ming, Guihai Xie, Yinlong Wang
A Study on a Generalized FCM

In this paper, we unify several alternative FCM algorithms, including the FCM, PFCM, CFCM, PIM, ICS, etc, into one model, called a generalized fuzzy c-means (GFCM). This GFCM model presents a wide variation of FCM algorithms. We discover that the pseudo mass center of the data set is the fixed point of the GFCM under mild conditions. By judging the stability of the fixed point of the GFCM, we give a new approach to theoretically choosing the parameters in the GFCM model.

Jian Yu, Miin-shen Yang
Fuzzy Multiple Synapses Neural Network and Fuzzy Clustering

Fuzzy multiple synapses neural network is proposed and the limitation of traditional Hopfield network, which handles the quadratic optimization problems, is overcome. First, fuzzy multiple synapses neuron and network are defined; and a hybrid neural network is given which combines multiple synapses neural network with Hopfield network. It may solve constrained optimization problems whose objective functions may not only include high-order form, but also may include logarithmic, sinusoidal and etc.. Second, it is applied to fuzzy clustering and a concrete hybrid neural network is derived. Experiments show that this method is very valid. Finally, conclusion and future work are given.

Kai Li, Houkuan Huang, Jian Yu
On Possibilistic Variance of Fuzzy Numbers

The lower and upper possibilistic mean values of fuzzy numbers were introduced by Carlsson and Fullér. This paper introduces the concepts of lower possibilistic and upper possibilistic variances of fuzzy numbers. These concepts are consistent with the extension principle and with the well-known definition of variance in probability theory. We also define a crisp possibilistic variance which differs from the one given by Carisson and Fullér. Furthermore, we show that the lower and upper possibilistic variances of linear combination of fuzzy numbers can be computed in a similar manner as in probability theory.

Wei-Guo Zhang, Zan-Kan Nie

Granular Computing

Deductive Data Mining
Mathematical Foundation of Database Mining

In ICDM02 the Foundation on Data Mining and Discovery Workshop [3], we have proposed that data mining is a procedure that transforms (extracts or discovers from) data into patterns/knowledge: Schematically $$ DM:Patterns \Leftarrow Data, or KD:Knowledge \Leftarrow Data $$

Tsau Young Lin
Information Granules for Intelligent Knowledge Structures

The premise of this paper is that the acquisition, aggregation, merging and use of information requires some new ideas, tools and techniques which can simplify the construction, analysis and use of what we call ephemeral knowledge structures. Ephemeral knowledge structures are used and constructed by granular agents. Each agent contains its own granular information structure and granular information structures of agents can be combined together. The main concept considered in this paper is an information granule. An information granule is a concise conceptual unit that can be integrated into a larger information infrastructure consisting of other information granules and dependencies between them. The novelty of this paper is that it provides a concise and formal definition of a particular view of information granule and its associated operators, as required in advanced knowledge representation applications.

Patrick Doherty, Witold Łukaszewicz, Andrzej Szałas
Design and Implement for Diagnosis Systems of Hemorheology on Blood Viscosity Syndrome Based on GrC

This paper discusses the design and implement for the diagnosis software of blood flowing dynamic theory on blood viscosity syndrome (BVS). The BVS is a clinical syndrome caused by one or several blood viscosity factors. The software of diagnosis and treatment in medicine is a reasoning system of the experience of simulating clinical experts. In the system, the experience of experts is transformed into the mathematical formulas using rough-fuzzy & fuzzy-rough approach. And then we create the reasoning system by the mathematical formulas and granular computing. The development of diagnosis software is successful via the applications of several thousand cases in clinic. The system is dynamic, it can learn from examples by visiting the case base. So addition, subtraction, adjustment and update for treatment measures are implemented dynamically. The formulas of the system in compare with similar systems are more perfect. The diagnosis efficiency in clinic of the system in compare with the doctors is higher.

Qing Liu, Feng Jiang, Dayong Deng
Granular Reasoning Using Zooming In & Out
Part 1. Propositional Reasoning (An Extended Abstract)

The concept of granular computing is applied to propositional reasoning. Such kind of reasoning is called granular reasoning in this paper. For the purpose, two operations called zooming in & out is introduced to reconstruct granules of possible worlds.

T. Murai, G. Resconi, M. Nakata, Y. Sato
A Pure Mereological Approach to Roughness

The present paper aims to establish an approach to roughness with mereological relations between information granules rather than common set theoretic relations. With Atomic Granule, the minimum information unit of an information system is encapsulated, which is a triple encoding the semantics “an entity has the relevant attribute of specified value”. Based on it, a granular calculus is built up. Using the granular calculus, according to underlying mereological relations, lower and upper granule approximation is defined to achieve roughness over granular representation of Information System.

Bo Chen, Mingtian Zhou

Neural Networks and Evolutionary Computing

Knowledge Based Descriptive Neural Networks

This paper presents a study of knowledge based descriptive neural networks (DNN). DNN is a neural network that incorporates rules extracted from trained neural networks. One of the major drawbacks of neural network models is that they could not explain what they have done. Extracting rules from trained neural networks is one of the solutions. However, how to effectively use extracted rules has been paid little attention. This paper addresses issues of effective ways of using these extracted rules. With the introduction of DNN, we not only keep the good feature of nonlinearity in neural networks but also have explanation of underlying reasoning mechanisms, for instance, how prediction is made.

J. T. Yao
Genetically Optimized Rule-Based Fuzzy Polynomial Neural Networks: Synthesis of Computational Intelligence Technologies

In this study, we introduce a concept of Rule-based fuzzy polynomial neural networks(RFPNN), a hybrid modeling architecture combining rule-based fuzzy neural networks(RFNN) and polynomial neural networks(PNN). We discuss their comprehensive design methodology. The development of the RFPNN dwells on the technologies of Computational Intelligence(CI), namely fuzzy sets, neural networks, and genetic algorithms. The architecture of the RFPNN results from a synergistic usage of RFNN and PNN. RFNN contribute to the formation of the premise part of the rule-based structure of the RFPNN. The consequence part of the RFPNN is designed using PNN. We discuss two kinds of RFPNN architectures and propose a comprehensive learning algorithm. In particular, it is shown that this network exhibits a dynamic structure. The experimental results include well-known software data such as the Medical Imaging System(MIS) dataset.

Sung-Kwun Oh, James F. Peters, Witold Pedrycz, Tae-Chon Ahn
Ant Colony Optimization for Navigating Complex Labyrinths

Navigating complex labyrinths was very difficult and time-consumed. A new way to solve this problem, called Ant Colony Optimization (ACO), was proposed. The main characteristics of ACO were positive feedback, distributed computation and the use of a constructive greedy heuristic. The object of this paper was to apply this approach to navigating complex labyrinths and finding the shortest paths for the traffic networks. To do these problems different updating rules of pheromone were tested and different versions of ACO were compared. The experiments indicated that ACO with step-by-step updating rule was more efficient than others. And the results also suggested that ACO might find wide applications in the traffic management.

Zhong Yan, Chun-Wie Yuan
An Improved Quantum Genetic Algorithm and Its Application

An improved quantum genetic algorithm (IQGA) is proposed in this paper. In IQGA, the strategies of updating quantum gate by using the best solution and introducing population catastrope are used. The typical function tests show convergent speed of IQGA is faster than that of quantum genetic algorithm(QGA) and other several GAs, and IQGA can also make up for prematureness of QGA. The simulations of FIR filter design demonstrate IQGA is superior to QGA, the methods in reference [5] and traditional method.

Gexiang Zhang, Weidong Jin, Na Li
Intelligent Generation of Candidate Sets for Genetic Algorithms in Very Large Search Spaces

We have been working on how to select safety measures for space missions in an optimal way. The main limitation on the measures that can be performed is cost. There are often hundreds of possible measures and each measure has an associated cost and an effectiveness that describes its potential to reduce the risk to the mission goals. A computer search of such an enormous search space is not practical if every combination is evaluated. It was therefore decided to use an evolutionary algorithm to improve the efficiency of the search. A simple approach would lead to many sets of solutions which were wildly expensive and so unfeasible. Preselecting candidates which meet the cost goals reduces the problem to a manageable size. Preselection is based on rough set theory since cost goals are usually not rigid. This paper describes the methodology of ensuring every candidate is roughly comparable in cost.

Julia R. Dunphy, Jose J. Salcedo, Keri S. Murphy
Fast Retraining of Artificial Neural Networks

In this paper we propose a practical mechanism for extracting information directly from the weights of a reference artificial neural network (ANN). We use this information to train a structurally identical ANN that has some variations of the global transformation input-output function. To be able to fulfill our goal, we reduce the reference network weights by a scaling factor. The evaluation of the computing effort involved in the retraining of some ANNs shows us that a good choice for the scaling factor can substantially reduce the number of training cycles independent of the learning methods. The retraining mechanism is analyzed for the feedforward ANNs with two inputs and one output.

Dumitru-Iulian Nastac, Razvan Matei
Fuzzy-ARTMAP and Higher-Order Statistics Based Blind Equalization

This paper discusses a blind equalization technique for FIR channel system, that might be minimum phase or not, in digital communication. The proposed techniques consist of two parts. One is to estimate the original channel coefficients based on fourth-order cumulants of the channel output, the other is to employ fuzzy-ARTMAP neural network to model an inverse system for the original channel. In simulation studies, the performance of the proposed blind equalizer is compared with both linear and other neural basis equalizers.

Dong-kun Jee, Jung-sik Lee, Ju-Hong Lee
Comparison of BPL and RBF Network in Intrusion Detection System

In this paper, we present the performance comparison results of the backpropagation learning (BPL) algorithm in a multilayer perceptron (MLP) neural network and the radial basis functions (RBF) network for intrusion detection. The results show that RBF network improves the performance of intrusion detection systems (IDSs) in anomaly detection with a high detection rate and a low false positive rate. RBF network requires less training time and can be optimized to balance the detection and the false positive rates.

Chunlin Zhang, Ju Jiang, Mohamed Kamel
Back Propagation with Randomized Cost Function for Training Neural Networks

A novel method to improve both the generalization and convergence performance of the back propagation algorithm (BP) by using multiple cost functions with a randomizing scheme is proposed in this paper. Under certain conditions, the randomized technique will converge to the global minimum with probability one. Experimental results on benchmark Encoder-Decoder problems and the NC2 classification problem show that the method is effective in enhancing BP’s convergence and generalization performance.

H. A. Babri, Y. Q. Chen, Kamran Ahsan

Data Mining, Machine Learning, and Pattern Recognition

Selective Ensemble of Decision Trees

An ensemble is generated by training multiple component learners for a same task and then combining their predictions. In most ensemble algorithms, all the trained component learners are employed in constituting an ensemble. But recently, it has been shown that when the learners are neural networks, it may be better to ensemble some instead of all of the learners. In this paper, this claim is generalized to situations where the component learners are decision trees. Experiments show that ensembles generated by a selective ensemble algorithm, which selects some of the trained C4.5 decision trees to make up an ensemble, may be not only smaller in the size but also stronger in the generalization than ensembles generated by non-selective algorithms.

Zhi-Hua Zhou, Wei Tang
A Maximal Frequent Itemset Algorithm

We present MinMax, a new algorithm for mining maximal frequent itemsets(MFI) from a transaction database. It is based on depth-first traversal and iterative. It combines a vertical tidset representation of the database with effective pruning mechanisms. MinMax removes all the non-maximal frequent itemsets to get the exact set of MFI directly, needless to enumerate all the frequent itemsets from smaller ones step by step. It backtracks to the proper ancestor directly, needless level by level. We found MinMax to be more effective than GenMax, a state-of-the-art algorithm for finding maximal frequent itemsets, to prune the search space to get the exact set of MFI.

Hui Wang, Qinghua Li, Chuanxiang Ma, Kenli Li
On Data Mining for Direct Marketing

Direct marketing is a new business model by an interactive one-to-one communication between marketer and customer. There is great potential for data mining to make useful contributions to the marketing discipline for business intelligence. This paper provides an overview on the recent development in data mining applications for direct marketing.

Chuangxin Ou, Chunnian Liu, Jiajing Huang, Ning Zhong
A New Incremental Maintenance Algorithm of Data Cube

A well-known challenge in data warehousing is the efficient incremental maintenance of data cube in the presence of source data updates. In this paper, we present a new incremental maintenance algorithm developed from Mumick’s algorithm. Instead of using one auxiliary delta table, We use two to improve efficiency of data update. Moreover, when a materialized view has to be recomputed, we use its smallest ancestral view’s data, while Mumick uses the fact table which is usually much lager than its smallest ancestor. We have implemented this algorithm and found the performance has a significant improvement.

Hongsong Li, Houkuan Huang, Youfang Lin
Data Mining for Motifs in DNA Sequences

In the large collections of genomic information accumulated in recent years there is potentially significant knowledge for exploitation in medicine and in the pharmaceutical industry. One interesting approach to the distillation of such knowledge is to detect strings in DNA sequences which are very repetitive within a given sequence (eg for a particular patient) or across sequences (eg from different patients who have been classified in some way eg as sharing a particular medical diagnosis). Motifs are strings that occur relatively frequently.In this paper we present basic theory and algorithms for finding such frequent and common strings. We are particularly interested in strings which are maximally frequent and, having discovered very frequent motifs we show how to mine association rules by an existing rough sets based technique. Further work and applications are in process.

D. A. Bell, J. W. Guan
Maximum Item First Pattern Growth for Mining Frequent Patterns

Frequent pattern mining plays an essential role in many important data mining tasks. FP-Growth, an algorithm which mines frequent patterns in the frequent pattern tree (FP-tree), is very efficient. However, it still encounters performance bottlenecks when creating conditional FP-trees recursively during the mining process. In this work, we propose a new algorithm, called Maximum-Item-First Pattern Growth (MIFPG), for mining frequent patterns. MIFPG searches the FP-tree in the depth-first, top-down manner, as opposed to the bottom-up order of FP-Growth. Its key idea is that maximum items are always considered first when the current pattern grows. In this way, no concrete realization of conditional pattern bases is needed and the major operations of mining are counting and link adjusting, which are usually inexpensive. Experiments show that, in comparison with FP-Growth, our algorithm is about three times faster and consumes less memory space; it also has good time and space scalability with the number of transactions.

Hongjian Fan, Ming Fan, Bingzheng Wang
Extended Random Sets for Knowledge Discovery in Information Systems

In this paper, we discuss the problem of knowledge discovery in information systems. A model is presented for users to obtain either “objective” interesting rules or “subjective” judgments of meaningful descriptions based on their needs. Extended random sets are presented firstly to describe the relationships between condition granules and decision granules. The interpretation is then given to show what we can obtain from the extended random sets.

Yuefeng Li
Research on a Union Algorithm of Multiple Concept Lattices

Concept lattice has played an important role in data mining and data processing. The paper gives definitions of same-field contexts, consistent contexts, same-field concept lattices, and consistent concept lattices, provides definitions of the addition operation of two same-field and consistent contexts as well as the union operation of two same-field and consistent concept lattices, and proves that the two operations above are isomorphic and satisfy other interesting mathematical properties, such as commutative and associative laws as well as having left and right identity elements. According to the definitions and properties of the union operation, a union algorithm of multiple concept lattices is deduced, in which some heuristic knowledge from order relation of the concepts is used, so the time efficiency of the algorithm can be improved. Experiments show that using the algorithm to merge two concept lattices distributed on different sites into one is evidently superior to the method of using Gordin’s algorithm to insert the objects of the formal context corresponding to second concept lattice one by one into the first lattice. Evidently, the algorithm provided in the paper is an effective parallel algorithm to construct concept lattice.

Zongtian Liu, Liansheng Li, Qing Zhang
A Theoretical Framework for Knowledge Discovery in Databases Based on Probabilistic Logic

In order to further improve the KDD process in terms of both the degree of automation achieved and types of knowledge discovered, we argue that a formal logical foundation is needed and suggest that Bacchus’ probability logic is a good choice. By completely staying within the expressiveness of Bacchus’ probability logic language, we give formal definitions of a “pattern” as well as its determiners, which are “previously unknown” and “potentially useful”. These definitions provide a sound foundation to overcome several deficiencies of current KDD systems with respect to novelty and usefulness judgment. Furthermore, based on this logic, we propose a logic induction operator that defines a standard process through which all the potentially useful patterns embedded in the given data can be discovered. This logic induction operator provides a formal characterization of the “discovery” process itself.

Ying Xie, Vijay V. Raghavan
An Improved Branch & Bound Algorithm in Feature Selection

The Branch & Bound (B&B) algorithm is a globally optimal feature selection method. The high computational complexity of this algorithm is a well-known problem. The B&B algorithm constructs a search tree, and then searches for the optimal feature subset in the tree. Previous work on the B&B algorithm was focused on how to simplify the search tree in order to reduce the search complexity. Several improvements have already existed. A detailed analysis of basic B&B algorithm and existing improvements is given under a common framework in which all the algorithms are compared. Based on this analysis, an improved B&B algorithm, BBPP+, is proposed. Experimental comparison shows that BBPP+ performs best.

Zhenxiao Wang, Jie Yang, Guozheng Li
Classification of Caenorhabditis Elegans Behavioural Phenotypes Using an Improved Binarization Method

Because of simple model organisms, Caenorhabditis (C.) elegans is often used in genetic analysis in neuroscience. The classification and analysis of C. elegans was previously performed subjectively. So the result of classification is not reliable and often imprecise. For this reason, automated video capture and analysis systems appeared. In this paper, we propose an improved binarization method using a hole detection algorithm. Using our method, we can preserve the hole and remove the noise, so that the accuracy of features is improved. In order to improve the classification success rate, we add new feature sets to the features of previous work. We also add 3 more mutant types of worms to the previous 6 types, and then analyze their behavioural characteristics.

Won Nah, Joong-Hwan Baek
Consensus versus Conflicts — Methodology and Applications

This paper is dedicated to applications of consensus methods in solving conflicts defined by Pawlak [12, 13]. The most characteristic feature of consensus methods surveyed in this paper is that the structures of opinions of conflict participants are multi-value and multi-attribute and the basis of consensus determining consists of distance functions between tuples. In this work an overview of consensus methods and their applications in the scope of conflict solving is presented.

Ngoc Thanh Nguyen, Janusz Sobecki
Interpolation Techniques for Geo-spatial Association Rule Mining

Association rule mining has become an important component of information processing systems due to significant increase in its applications. In this paper, our main objective is to find which interpolation approaches are best suited for discovering geo-spatial association rules from unsampled points. We investigate and integrate two interpolation approaches into our geo-spatial association rule mining algorithm. We call them pre-interpolation and post-interpolation approaches.

Dan Li, Jitender Deogun, Sherri Harms
Imprecise Causality in Mined Rules

Causality occupies a central position in human reasoning. It plays an essential role in commonsense decision-making. Data mining hopes to extract unsuspected information from very large databases. The results are inherently soft or fuzzy as the data is generally both incomplete and inexact. The best known data mining methods build rules. Association rules indicate the associative strength of data attributes. In many ways, the interest in association rules is that they seem to suggest causal, or at least, predictive relationships. Whether it can be said that any association rules express a causal relationship needs to be examined. In part, the utility of mined association rules depends on whether the rule is causal or coincidental. This paper explores some of the factors that impact causality in mined rules.

Lawrence J. Mazlack
Sphere-Structured Support Vector Machines for Multi-class Pattern Recognition

Support vector machines (SVM) are learning algorithms derived from statistical learning theory. The SVM approach was originally developed for binary classification problems. For solving multi-class classification problem, there are some methods such as one-against-rest, one-against-one, all-together and so on. But the computing time of all these methods are too long to solve large scale problem. In this paper SVMs architectures for multi-class problems are discussed, in particular we provide a new algorithm called sphere-structured SVMs to solve the multi-class problem. We show the algorithm in detail and analyze its characteristics. Not only the number of convex quadratic programming problems in sphere-structured SVMs is small, but also the number of variables in each programming is least. The computing time of classification is reduced. Otherwise, the characteristics of sphere-structured SVMs make expand data easily.

Meilin Zhu, Yue Wang, Shifu Chen, Xiangdong Liu
HIPRICE-A Hybrid Model for Multi-agent Intelligent Recommendation

We explore the acquisition of user profiles by unobtrusively monitoring the browsing behaviour of users by applying supervised machine-learning techniques coupled with an ontological representation to extract user preferences. The HIPRICE recommender system is presented and an empirical evaluation of this approach is conducted. The performance of the integrated systems is measured and presented as well.

ZhengYu Gong, Jing Shi, HangPing Qiu
A Database-Based Job Management System

By combining database and job management technology, this paper designs a database-based job management system (JMS), called DB-Based JMS. The system architecture is described. The functions and relationships of the components in this system are defined. Aiming at network computing environment, two kinds of DB-Based JMS cluster model are provided. Their working modes are detailed, and their advantages and disadvantages are compared. In addition, scheduling granularity is also discussed.

Ji-chuan Zheng, Zheng-guo Hu, Liang-liang Xing
Optimal Choice of Parameters for a Density-Based Clustering Algorithm

Clustering is an important and challenging task in data mining. As a kind of generalized density-based clustering methods, DENCLUE algorithm has many remarkable properties, but the quality of clustering results strongly depends on the adequate choice of two parameters: density parameter σ and noise threshold ξ. In this paper, by investigating the influence of the two parameters of DENCLUE algorithm on the clustering results, we firstly show that an optimal σ should be chosen to obtain good clustering results. Then, an entropy-based method is proposed for the optimal choice of σ. Further, noise threshold ξ is estimated to produce a reasonable pattern of clustering. Finally, experiments are performed to illustrate the effectiveness of our methods.

Wenyan Gan, Deyi Li
An Improved Parameter Tuning Method for Support Vector Machines

Support vector machines (SVMs) is a very important tool for data mining. However, the problem of tuning parameters manually limits its application in practical environment. In this paper, under analyzing the limitation of these existing approaches, a new methodology to tuning kernel parameters, based on the computation of the gradient of penalty function with respect to the RBF kernel parameters, is proposed. Simulation results reveal the feasibility of this new approach and demonstrate an improvement of generalization ability.

Yong Quan, Jie Yang
Approximate Algorithm for Minimization of Decision Tree Depth

In the paper a greedy algorithm for minimization of decision tree depth is described and bounds on the algorithm precision are considered. This algorithm is adapted for application to data tables with both discrete and continuous variables, which can have missing values. To this end we transform given data table into a decision table. Under some natural assumption on the class N P the considered algorithm is close to unimprovable approximate polynomial algorithms for minimization of decision tree depth.

Mikhail J. Moshkov
Virtual Reality Representation of Information Systems and Decision Rules: An Exploratory Technique for Understanding Data and Knowledge Structure

This present paper introduces a virtual reality technique for visual data mining on heterogeneous information systems. The method is based on parametrized mappings between heterogeneous spaces with extended information systems and a virtual reality space. They can be also constructed for unions of heterogeneous and incomplete data sets together with knowledge bases composed by decision rules. This approach has been applied successfully to a wide variety of real-world domains and examples are presented from genomic research and geology.

Julio J. Valdés
Hierarchical Clustering Algorithm Based on Neighborhood-Linked in Large Spatial Databases

A novel hierarchical clustering algorithm based on neighborhood-linked is proposed in this paper. Unlike the traditional hierarchical clustering algorithm, the new model only adopts two steps: clustering primarily and merging. The algorithm can be performed in high-dimensional data set, clustering the arbitrary shape of clusters. Furthermore, not only can this algorithm dispose the data with numeric attributes, but with boolean and categorical attributes. The results of our experimental study in data sets with arbitrary shape and size are very encouraging. We also conduct an experimental study with web log files that can help us to discover the use access patterns effectively. Our study shows that this algorithm generates better quality clusters than traditional algorithms, and scales well for large spatial databases.

Yi-hong Dong
Unsupervised Learning of Pattern Templates from Unannotated Corpora for Proper Noun Extraction

This paper describes an approach to extracting proper nouns in the very large text corpora without using the lexicon or cue word dictionary. At first, we train the pattern for extracting the proper nouns by applying the initial proper names into the unannotated corpora that does not have any tags yet. And then we continuously apply the pattern templates into the corpora in order to extract new proper nouns until certain period.

Seung-Shik Kang, Chong-Woo Woo
Approximate Aggregate Queries with Guaranteed Error Bounds

It is very important to provide analysts with guaranteed error bounds for approximate aggregate queries in many current enterprise applications such as the decision support systems. In this paper, we propose a general technique to provide tight error bounds for approximate results to OLAP range-sum queries. We perform an extensive experiment on diverse data sets, and examine the effectiveness of our proposed method with respect to various dimensions of the data cube and query sizes.

Seok-Ju Chun, Ju-Hong Lee, Seok-Lyong Lee
Improving Classification Performance by Combining Multiple TAN Classifiers

Boosting is an effective classifier combination method, which can improve classification performance of an unstable learning algorithm. But it does not have much more improvements on a stable learning algorithm. TAN, Tree-Augmented Naive Bayes, is a tree-like Bayesian network. The standard TAN learning algorithm generates a stable TAN classifier, which is difficult to improve its accuracy by boosting technique. In this paper, a new TAN learning algorithm called GTAN and a TAN classifier combination method called Boosting-MultiTAN are presented. Through comparisons of this TAN classifier combination method with the standard TAN classifier in the experiments, the Boosting-MultiTAN shows higher classification accuracy than the standard TAN classifier on the most data sets.

Hongbo Shi, Zhihai Wang, Houkuan Huang
Image Recognition Using Adaptive Fuzzy Neural Network and Wavelet Transform

We have proposed to design and implement an image recognition system in software and hardware using features extracted from the wavelet transform (WT) of the image as input to a pattern classifier. The wavelet transform will be computed via an adaptive neural network while the pattern classification will be carried out by an adaptive fuzzy neural network. Thus, the system will be fully parallel and distributive.

Huanglin Zeng, Yao Yi
SOM Based Image Segmentation

Image segmentation plays an important role in image retrieval system. In this paper, a method for segmenting images based on SOM neural network is proposed. At first, the pixels are clustered based on their color and spatial features, where the clustering process is accomplished with a SOM network. Then, the clustered blocks are merged to a specific number of regions. Experiments show that these regions could be regarded as segmentation results reserving some semantic means. This approach thus provides a feasible new solution for image segmentation which may be helpful in image retrieval.

Yuan Jiang, Ke-Jia Chen, Zhi-Hua Zhou
User’s Interests Navigation Model Based on Hidden Markov Model

To Find user’s frequent interest navigation patterns, we combine the information of web content and web server log, mine the web data to build a hidden markov model. In our approach, we build a hidden markov model according to page’s content and web service log firstly, and then we present a new incremental discovery algorithm Hmm_R to discover the interest navigation patterns. ...

Jing Shi, Fang Shi, HangPing Qiu
Successive Overrelaxation for Support Vector Regression

Support vector regression (SVR) is an important tool for data mining. In this paper, we first introduce a new way to make SVR have the similar mathematic form as that of support vector classification. Then we propose a versatile iterative method, successive overrelaxation, for the solution of extremely large regression problems using support vector machines. Experiments prove that this new method converges considerably faster than other methods that require the presence of a substantial amount of the data in memory.

Yong Quan, Jie Yang, Chenzhou Ye
Statistic Learning and Intrusion Detection

The goal of intrusion detection is to determine whether there are illegal or dangerous actions or activities in the system by checking the audit data on local machines or information gathered from network. It also can be look as the problem that search relationship between the audit data on local machines or information gathered from network and the states of the system need to be protected, that is, normal or abnormal. The statistic learning theory just study the problem of searching unknown relationship based on size limited samples. The statistic theory is introduced briefly. By modeling the key process of intrusion detection, the relationship between two problems can be found. The possibility of using the methods of statistic theory in intrusion detection is analyzed. Finally the new fruit in statistic learning theory —Support Vector Machines—is used in simulation of network intrusion detection using the DRAPA data. The simulation results show support vector machines can detection intrusions very successfully. It overcomes many disadvantages that many methods now available have. It can lower the false positive with higher detection rate. And since it using small size samples, it shortens the training time greatly.

Xian Rao, Cun-xi Dong, Shao-quan Yang
A New Association Rules Mining Algorithms Based on Directed Itemsets Graph

In this paper, we introduced a new data structure called DISG (Directed itemsets graph)in which the information of frequent itemsets was stored. Based on it, a new algorithm called DBDG(DFS Based -DISG) was developed by using depth first searching strategy. At last we performed a experiment on a real dataset to test the run time of DBDG. The experiment showd that it was efficient for mining dense datasets.

Lei Wen, Minqiang Li
A Distributed Multidimensional Data Model of Data Warehouse

The base fact tables of a data warehouse are generally of huge size. In order to shorten query response time and to improve maintenance performance, a base fact table is generally partitioned into multiple instances of same schema. To facilitate multidimensional data analysis, common methods are to create different multidimensional data models (MDDM) for different instances which may make it difficult to realize transparent multi-instance queries and maintenance. To resolve the problems, this paper proposes a logically integrated and physically distributed multidimensional data model.

Youfang Lin, Houkuan Huang, Hongsong Li

Logics and Reasoning

An Overview of Hybrid Possibilistic Reasoning

The objective of this paper is to introduce the hybrid logic methodology into possibilistic reasoning. It has been well-known that possibilistic logic has some strong modal logic flavor. However, modal logic lacks the capability of referring to states or possible worlds though states are crucial to its semantics. Hybrid logic circumvents the problem by blending the classical logic mechanism into modal logic. It is a hybrid of classical logic and modal logic, however, unlike classical logic, it treats terms and formulas uniformly. We study some variants of hybrid possibilistic logic, including hybrid qualitative possibility logic, hybrid graded modal logic, and hybrid possibilistic description logic. The syntax and semantics of the logics are presented. Some possible applications of the proposed logics are also suggested.

Churn-Jung Liau
Critical Remarks on the Computational Complexity in Probabilistic Inference

In this paper, we review the historical development of using probability for managing uncertain information from the inference perspective. We discuss in particular the NP-hardness of probabilistic inference in Bayesian networks.

S. K. M. Wong, D. Wu, Y. Y. Yao
Critical Remarks on the Maximal Prime Decomposition of Bayesian Networks

We present a critical analysis of the maximal prime decomposition of Bayesian networks (BNs). Our analysis suggests that it may be more useful to transform a BN into a hierarchical Markov network.

Cory J. Butz, Qiang Hu, Xue Dong Yang
A Non-local Coarsening Result in Granular Probabilistic Networks

In our earlier works, we coined the phrase granular probabilistic reasoning and showed a local coarsening result. In this paper, we present a non-local method for coarsening variables (i.e., the variables are spread throughout the network) and establish its correctness.

Cory J. Butz, Hong Yao, Howard J. Hamilton
Probabilistic Inference on Three-Valued Logic

In this paper, we extend Nilsson’s probabilistic logic [1] to allow that each sentence S has three sets of possible worlds. We adopt the ideas of consistency and possible worlds introduced by Nilsson in [1], but propose a new method called linear equation method to deal with the problems of probabilistic inference, the results of our method is consistent with those of Yao’s interval-set model method.

Guilin Qi
Multi-dimensional Observer-Centred Qualitative Spatial-temporal Reasoning

Multi-dimensional spatial occlusion relation (MSO) is an observer-centred model which could express the relation between the images of two bodies x and y from a viewpoint v. We study the basic reasoning method of MSO, then extend MSO to spatial-temporal field by adding time feature to it, and the related definitions are given.

Yi-nan Lu, Sheng-sheng Wang, Sheng-xian Sha

Multi-agent Systems

Architecture Specification for Design of Agent-Based System in Domain View

How to engineer agent-based system is essential factor to develop agent or agent-based system. Existing methodologies for agent-based system development focus on the development phases and activities in each phase and don’t enough consider system organization and performance aspects. It is important to provide designer with detailed guidance about how agent-based system can be organized and specified. This paper defines computing environment in view of domain, proposes the methods that identify domain, deploy agent, organize system, and analyzes characteristics of the established system organization.

S. K. Lee, Taiyun Kim
Adapting Granular Rough Theory to Multi-agent Context

The present paper focuses on adapting the Granular Rough Theory to a Multi-Agent system. By transforming the original triple form atomic granule into a quadruple, we encapsulate agent-specific viewpoint into information granules to mean “an agent knows/believes that a given entity has the attribute type with the specific value”. Then a quasi-Cartesian qualitative coordinate system named Granule Space is defined to visualize information granules due to their agent views, entity identities and attribute types. We extend Granular Rough Theory into new versions applicable to the 3-D information cube based M-Information System. Then challenges in MAS context to rough approaches are analyzed, in forms of an obvious puzzle. Though leaving systematic solutions as open issues, we suggest auxiliary measurements to alleviate, at least as tools to evaluate, the invalidity of rough approach in MAS.

Bo Chen, Mingtian Zhou
How to Choose the Optimal Policy in Multi-agent Belief Revision?

In multi-agent system, it is not enough to maintain the coherence of single agent’s belief set only. By communication, one agent removing or adding a belief sentence may influence the certainty of other agents’ belief sets. In this paper, we investigate carefully some features of belief revision process in different multi-agent systems and bring forward a uniform framework — Bereframe. In Bereframe, we think there are other objects rather than maintaining the coherence or maximize certainty of single belief revision process, and agent must choose the optimal revision policy to realize these. The credibility measure is brought forward, which is used to compute the max credibility degree of the knowledge background. In cooperative multi-agent system, agents always choose the combinative policy to maximize the certainty of whole system’s belief set according to the welfare principle. And in semi-competitive multi-agent system, agents choose the revision policy according to the Pareto efficiency principle. In competitive multi-agent system, the Nash equilibrium criterion is applied into agent’s belief revision process.

Yang Gao, Zhaochun Sun, Ning Li

Web Intelligence and Intelligent Systems

Research of Atomic and Anonymous Electronic Commerce Protocol

Atomicity and anonymity are the two important requirements for application in electronic commerce. However absolutely anonymity may lead to confliction with law enforcement, e.g. blackmailing or money laundering. Therefore, it is important to design systems satisfying both atomicity and revocable anonymity. Based on the concept of two-phase commitment, we realize atomicity in electronic transaction with the Trusted Third Part as coordinator. Also we develops Brands’ fair signature model, propose a method to enable not only anonymity but also owner-trace and money-trace.

Jie Tang, Juan-Zi Li, Ke-Hong Wang, Yue-Ru Cai
Colored Petri Net Based Attack Modeling

Color Petri Net (CPN) based attack modeling approach is addressed. CPN based attack model is flexible enough to model Internet intrusion, including the static and dynamic features of the intrusion. The processes and rules of building CPN based attack model from attack tree are also presented. In order to evaluate the risk of intrusion, some cost elements are added to CPN based attack modeling. Experiences also show that it is easy to exploit CPN based attack modeling approach to provide the controlling functions.

Shijie Zhou, Zhiguang Qin, Feng Zhang, Xianfeng Zhang, Wei Chen, Jinde Liu
Intelligent Real-Time Traffic Signal Control Based on a Paraconsistent Logic Program EVALPSN

In this paper, we introduce an intelligent real-time traffic signal control system based on a paraconsistent logic program called an EVALPSN (Extended Vector Annotated Logic Program with Strong Negation), that can deal with contradiction and defeasible deontic reasoning. We show how the traffic signal control is implemented in EVALPSN with taking a simple intersection example in Japan. Simulation results for comparing EVALPSN traffic signal control to fixed-time traffic signal control are also provided.

Kazumi Nakamatsu, Toshiaki Seno, Jair Minoro Abe, Atsuyuki Suzuki
Transporting CAN Messages over WATM

A new method describing fixed wireless CAN networking is presented in this paper, exploiting the advantages of WATM as an over the air protocol, which include fixed-small cell size and connection negotiation. CAN over WATM mapping issues using encapsulation technique are also explained. The performance analysis of the proposed scheme is presented through computer simulation results provided by OPNET Modeler.

Ismail Erturk
A Hybrid Intrusion Detection Strategy Used for Web Security

This paper describes a novel framework for intrusion detection systems used for Web security. A hierarchical structure was proposed to gain both server-based detection and network-based detection. The system consists of three major components. First, there is a host detection module (HDM) in each web server and a collection of detection units (UC) running on background in the host. Second, each subnet has a network detection module (NDM), which operates just like a HDM except that it analyzes network traffic. Finally, there is a central control detection module (CCDM), which is served as a high level administrative center. The CCDM receives reports from various HDM and NDM modules, and by processing and correlating these reports to detect intrusions. Detection rules are inductively learned from audit records and distributed to each detection modules in the CCDM.

Bo Yang, Han Li, Yi Li, Shaojun Yang
Mining Sequence Pattern from Time Series Based on Inter-relevant Successive Trees Model

In this paper, a novel method is proposed to discover frequent pattern from time series. It first segments time series based on perceptually important points, then converted it into meaningful symbol sequences by the relative scope, finally used a new mining model to find frequent patterns. Compared with the previous methods, the method is simpler and more efficient.

Haiquan Zeng, Zhan Shen, Yunfa Hu
Backmatter
Metadata
Title
Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing
Editors
Guoyin Wang
Qing Liu
Yiyu Yao
Andrzej Skowron
Copyright Year
2003
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-39205-7
Print ISBN
978-3-540-14040-5
DOI
https://doi.org/10.1007/3-540-39205-X