scroll identifier for mobile
main-content

## Über dieses Buch

This book constitutes the refereed proceedings of the 6th International Conference on Rough Sets and Knowledge Technology, RSKT 2011, held in Banff, Canada, in September 2011. The 89 revised full papers presented together with 3 keynote lectures and 1 invited tutorial session were carefully reviewed and selected from 229 submissions. The papers are organized in topical sections on attribute reduction and feature selection, generalized rough set models, machine learning with rough and hybrid techniques, knowledge technology and intelligent systems and applications.

## Inhaltsverzeichnis

### Mining Incomplete Data—A Rough Set Approach

A rough set approach to mining incomplete data is presented in this paper. Our main tool is an attribute-value pair block. A characteristic set, a generalization of the elementary set well-known in rough set theory, may be computed using such blocks. For incomplete data sets three different types of global approximations: singleton, subset and concept are defined. Additionally, for incomplete data sets a local approximation is defined as well.

Jerzy W. Grzymała-Busse

### Uncertainty and Feature Selection in Rough Set Theory

In rough set theory, the uncertainty of granulation and efficient feature selection algorithms have attracted much attention in recent years. We focus on the review of several common uncertainty measures and the relationships among them. An efficient accelerator is developed to accelerate a heuristic process of feature selection.

Jiye Liang

### Towards Designing Human Centric Systems: A New View at System Modeling with Granular Membership Grades

Modeling with the use of fuzzy sets (fuzzy modeling), rough sets (rough modeling) or information granules, in general, offers an interesting modeling alternative. Fuzzy modeling has been around for several decades and over this time there have been a number of interesting conceptual developments, modeling architectures, and algorithmic pursuits. Fuzzy sets used as integral components are regarded as numeric constructs. Subsequently, in light of this, models are inherently numeric. The objective here is to study a new avenue of fuzzy modeling - granular fuzzy modeling where instead of being numeric, fuzzy sets are represented by some granular counterparts such as e.g., interval fuzzy sets, shadowed sets, or fuzzy fuzzy (fuzzy

2

) sets. Several ways of constructing (transforming) numeric membership functions into granular constructs (granular mappings) are conceptualized along with the detailed algorithmic aspects.

Regarding granular fuzzy models in which granular membership functions are utilized, two representative studies are presented. The first one is concerned with a granular interpretation of temporal data where the role of information granularity is profoundly visible when effectively supporting human-centric description of relationships existing in data. In the second study, we focus on the Analytic Hierarchy Process (AHP) studied in decision-making.

Witold Pedrycz

### Sufficiently Near Sets of Neighbourhoods

The focus of this paper is on sets of neighbourhoods that are sufficiently near each other as yet another way to consider near sets. This study has important implications in M. Katětov’s approach to topologising a set. A pair of neighbourhoods of points are sufficiently near, provided that the C̆ech distance between the neighbourhoods is less than some number

ε

. Sets of neighbourhoods are sufficiently near, provided the C̆ech distance between the sets of neighbourhoods is less than some number

ε

.

James F. Peters

### History of Set Theory and Its Extensions in the Context of Soft Computing

The arithmetization programs of mathematicians in later half of nineteenth century showed that it is necessary to rebuild analysis. Riemanns attempt to find necessary and sufficient conditions for representation of a function by its Fourier series led to the recognization of different types of infinite sets. This fact motivated Georg Cantor to develop unified theory of sets. In fact Cantors interest in set theory stemmed from his researches on trigonometric series in general and fourier series in particular. Nowadays, theory of sets and functions forms the foundation upon which the structure of modern mathematics is built.

Lotfi A. Zadeh, in 1965, published his seminal paper on Fuzzy Sets. The membership in a fuzzy set is not a matter of affirmation or denial, but rather a matter of degree. The significance of Zadehs work is that it led to study of uncertainty from most general point of view, restricting classical probability theory to study certain types of uncertainty only. In the process, Aristotelian two valued logic is generalized to Fuzzy Logic, which is an infinite valued logic. During last four decades researchers realized the tremendous scope of fuzzy sets and fuzzy logic in Soft Computing. Neural networks, Neuro-Fuzzy Modeling, Neuro-Fuzzy Control, Fuzzy Databases and Information retrieval Systems, Fuzzy Decision Making are some of the important areas of research persued globally.

Z. Pawlak, in the year 1982, proposed Rough set theory, as a new mathematical approach to imperfect knowledge. Rough set theory is complementary to and can also be often used jointly with other approaches such as statistical methods, neural networks, genetic algorithms, fuzzy sets etc. Rough set approach is found important for Artificial Intelligence and Cognitive Sciences including data mining.

In this paper, we propose comparative advantages of fuzzy sets and rough sets in the context of Soft Computing.

Sarjerao Nimse, Pawan Lingras

### Comparison of Classical Dimensionality Reduction Methods with Novel Approach Based on Formal Concept Analysis

In the paper we deal with dimensionality reduction techniques for a dataset with discrete attributes. Dimensionality reduction is considered as one of the most important problems in data analysis. The main aim of our paper is to show advantages of a novel approach introduced and developed by Belohlavek and Vychodil in comparison of two classical dimensionality reduction methods which can be used for ordinal attributes (CATPCA and factor analysis). The novel technique is fundamentally different from existing ones since it is based on another kind of mathematical apparatus (namely, Galois connections, lattice theory, fuzzy logic). Therefore, this method is able to bring a new insight to examined data. The comparison is accompanied by analysis of two data sets which were obtained by questionnaire survey.

Eduard Bartl, Hana Rezankova, Lukas Sobisek

### Rule-Based Estimation of Attribute Relevance

We consider estimation of relevance of attributes used for classification. This estimation takes into account the predictive capabilities of the attributes. To this end, we are using Bayesian confirmation measure. The estimation is based on analysis of rule classifiers in classification tests. The attribute relevance measure increases when more rules involving this attribute suggest a correct decision, or when more rules that do not involve this attribute suggest an incorrect decision in the classification test; otherwise, the attribute relevance measure is decreasing. This requirement is satisfied by a monotonic Bayesian confirmation measure. Usefulness of the presented measure is verified experimentally.

Jerzy Błaszczyński, Roman Słowiński, Robert Susmaga

### Applications of Approximate Reducts to the Feature Selection Problem

In this paper we overview two feature rankings methods that utilize basic notions from the rough set theory, such as the idea of the decision reducts. We also propose a new algorithm, called Rough Attribute Ranker. In our approach, the usefulness of features is measured by their impact on quality of the reducts that contain them. We experimentally compare the reduct-based methods with several classic attribute rankers using synthetic, as well as real-life high dimensional datasets.

Andrzej Janusz, Sebastian Stawicki

### Dependence and Algebraic Structure of Formal Contexts

Formal concept analysis is an important approach of knowledge representation and data analysis. This paper focus on the dependence among formal contexts with common object set. A partial order relation among formal contexts is first introduced and its properties are examined. Subsequently, The notion of independence of a formal context is proposed by which attribute reduction of formal context is investigated. An useful conclusion is obtained, that is, all formal contexts with common object set form a lattice.

Tong-Jun Li, Ying-Xue Wu, Xiaoping Yang

### Optimal Sub-Reducts with Test Cost Constraint

Cost-sensitive learning extends classical machine learning by considering various types of costs, such as test costs and misclassification costs, of the data. In many applications, there is a test cost constraint due to limited money, time, or other resources. It is necessary to deliberately choose a set of tests to preserve more useful information for classification. To cope with this issue, we define optimal sub-reducts with test cost constraint and a corresponding problem for finding them. The new problem is more general than two existing problems, namely the minimal test cost reduct problem and the 0-1 knapsack problem, therefore it is more challenging than both of them. We propose two exhaustive algorithms to deal with it. One is straightforward, and the other takes advantage of some properties of the problem. The efficiencies of these two algorithms are compared through experiments on the mushroom dataset. Some potential enhancements are also pointed out.

Fan Min, William Zhu

### An Efficient Fuzzy-Rough Attribute Reduction Approach

Fuzzy rough set method provides an effective approach to data mining and knowledge discovery from hybrid data including categorical values and numerical values. However, its time-consumption is very intolerable to analyze data sets with large scale and high dimensionality. In this paper, we propose a strategy to improve a heuristic process of fuzzy-rough feature selection. Experiments show that this modified algorithm is much faster than its original version. It is worth noting that the performance of the modified algorithm becomes more visible when dealing with larger data sets.

Yuhua Qian, Chao Li, Jiye Liang

### A Novel Attribute Reduction Approach Based on the Object Oriented Concept Lattice

Attribute reduction is one basic issue in the analysis of information tables. In this paper, the approaches to attribute reduction in formal context based on the object oriented concept lattice are investigated. We first introduce the notions of context matrix and the operations of corresponding column vectors. Then present some judgment theorems for attribute reduction in formal contexts. Based on the judgment theorems, we propose an attribute reduction approach and show concrete reduction algorithm.

Mingwen Shao, Li Guo, Lan Li

### Rough-Set-Inspired Feature Subset Selection, Classifier Construction, and Rule Aggregation

We consider a rough-set-inspired framework for deriving feature subset ensembles from data. Each of feature subsets yields a single classifier, basically by generating its corresponding if-then decision rules from the training data. Feature subsets are extracted according to a simple randomized algorithm, following the filter (rather than wrapper or embedded) methodology. Classifier ensemble is built from single classifiers by defining aggregation laws on top of decision rules. We investigate whether rough-set-inspired methods can help in the steps of formulating feature subset optimization criteria, feature subset search heuristics, and the strategies of voting among classifiers. Comparing to our previous research, we pay a special attention to synchronization of the filter-based criteria for feature subset selection and extraction of rules basing on the obtained feature subsets. The overall framework is not supposed to produce the best-ever classification results, unless it is extended by some additional techniques known from the literature. Our major goal is to illustrate in a possibly simplistic way some general interactions between the above-mentioned criteria.

Dominik Ślęzak, Sebastian Widz

### A Constructive Feature Induction Mechanism Founded on Evolutionary Strategies with Fitness Functions Generated on the Basis of Decision Trees

In the paper, we present a novel approach to calculating a new (extra) attribute (feature) using a constructive feature induction mechanism. The problem being solved is founded on coefficients for values of existing attributes determined empirically using evolutionary strategies with fitness functions based on parameters calculated from decision trees generated for extended decision tables.

Mariusz Wrzesień, Wiesław Paja, Krzysztof Pancerz

### An Efficient Fuzzy Rough Approach for Feature Selection

Rough set theory is a powerful tool for feature selection. To avoid the information loss by discretization in rough sets, fuzzy rough sets are used to deal with the continuous values. However, the cost of computation of the approach is too high to be worked out as the number of selected features increases. In this paper, a new computational method is proposed to approximate the conditional mutual information between the selected features and the decision feature, and thus improve the efficiency and decrease the complexity of the classical fuzzy rough approach based on mutual information. Extensive experiments are conducted on the large-sized coal-fired power units dataset with steady state, and the experimental results confirm the efficiency and effectiveness of the proposed algorithm.

Feifei Xu, Weiguo Pan, Lai Wei, Haizhou Du

### Partitions, Coverings, Reducts and Rule Learning in Rough Set Theory

When applying rough set theory to rule learning, one commonly associates equivalence relations or partitions to a complete information table and tolerance relations or coverings to an incomplete table. Such associations are sometimes misleading. We argue that Pawlak three-step approach for data analysis indeed uses both partitions and coverings for a complete information table. A slightly different formulation of Pawlak approach is given based on the notions of attribute reducts of a classification table, attribute reducts of objects and rule reducts. Variations of Pawlak approach are examined.

Yiyu Yao, Rong Fu

### A Rough Set Approach to Feature Selection Based on Relative Decision Entropy

Rough set theory has been recognized to be one of the powerful tools for feature selection. The essence of rough set approach to feature selection is to find a minimal subset (or called reduct) of original features, which can predict the decision concepts as well as the original features. Since finding a minimal feature subset is a NP-hard problem, many heuristic algorithms have been proposed, which usually employ the feature significance in rough sets as heuristic information. Shannon’s information theory provides us a feasible way to measure the information of data sets with entropy. Hence, many researchers have used information entropy to measure the feature significance in rough sets, and proposed different information entropy models in rough sets. In this paper, we present a new information entropy model, in which relative decision entropy is defined. Based on the relative decision entropy, we propose a new rough set algorithm to feature selection. To verify the efficiency of our algorithm, experiments are carried out on some UCI data sets. The results demonstrate that our algorithm can provide competitive solutions efficiently.

Lin Zhou, Feng Jiang

### A Variable Precision Covering Generalized Rough Set Model

The covering generalized rough sets are an improvement of traditional rough set model to deal with more complex practical problems which the traditional one cannot handle. A variable precision extension of a covering generalized rough set model is proposed in this paper. Some properties are investigated.

Xinwei Zheng, Jian-Hua Dai

### Dominance-Based Rough Set Approach on Pairwise Comparison Tables to Decision Involving Multiple Decision Makers

In this paper, we present a rough set approach to pairwise comparison tables for supporting decisions of multiple decision makers. More precisely, we deal with preference learning from pairwise comparisons, in case of multiple decision makers. Comparing to classical rough set approach, there are three main differences that are the following: we are learning a preference relation, so we have to work with a pairwise comparison table, while the classical rough set approach considers a classification table; we are taking into account a preference order in data, so we have to use the Dominance-based Rough Set Approach (DRSA), while the classical rough set approach based on equivalence relation does not consider such an order; we are taking into account multiple decision makers, while the classical rough set approach considers mostly a single classification decision provided by one decision maker only.

Salvatore Greco, Benedetto Matarazzo, Roman Słowiński

### Generalized Parameterized Approximations

We study generalized parameterized approximations, defined using both rough set theory and probability theory. The main objective is to study, for a given subset of the universe

U

, all such parameterized approximations, i.e., for all parameter values. For an approximation space (

U

,

R

), where

R

is an equivalence relation, there is only one type of such parameterized approximations. For an approximation space (

U

,

R

), where

R

is an arbitrary binary relation, three types of parameterized approximations are introduced in this paper: singleton, subset and concept. We show that the number of parameterized approximations of given type is not greater than the cardinality of

U

. Additionally, we show that singleton parameterized approximations are not useful for data mining, since such approximations, in general, are not even locally definable.

Jerzy W. Grzymała-Busse

### Transversal and Function Matroidal Structures of Covering-Based Rough Sets

In many real world applications, information blocks form a covering of a universe. Covering-based rough set theory has been proposed to deal with this type of information. It is more general and complex than classical rough set theory, hence there is much need to develop sophisticated structures to characterize covering-based rough sets. Matroids are important tools for describing graphs and linear independence of matrix theory. This paper establishes two matroidal structures of covering-based rough sets. Firstly, the transversal matroidal structure of a family of subsets of a universe is constructed. We also prove that the family of subsets of a universe is a covering if and only if the constructed transversal matroid is a normal one. Secondly, the function matroidal structure is constructed through the upper approximation number. Moreover, the relationships between the two matroidal structures are studied. Specifically, when a covering of a universe is a partition, these two matroidal structures coincide with each other.

Shiping Wang, William Zhu, Fan Min

### Some Fuzzy Topologies Induced by Rough Fuzzy Sets

A rough fuzzy set is the result of approximation of a fuzzy set with respect to a crisp approximation space. In this paper, we investigate topological structures of rough fuzzy sets. We first show that a reflexive crisp rough approximation space can induce a fuzzy Alexandrov space. We then prove that the lower and upper rough fuzzy approximation operators are, respectively, the fuzzy interior operator and fuzzy closure operator if and only if the binary relation in the crisp approximation space is reflexive and transitive. Finally, we examine that a similarity crisp approximation space can produce a fuzzy clopen topological space.

Wei-Zhi Wu, Yu-Fang Yang, You-Hong Xu

### Neighborhood Rough Sets Based Matrix Approach for Calculation of the Approximations

Approximations of a concept in neighborhood rough sets (NRS) are used to induce rules and extract feature subset from hybrid information systems. To compute the approximations conveniently, this paper presents a new matrix view of NRS and the matrix representation of the lower and upper approximations in NRS. To be robust for tolerating the noisy samples in the data, a probabilistic neighborhood rough sets (PNRS) is proposed. According to the matrix view, a new method as well as its algorithm is presented for calculation of the lower and upper approximations under NRS and PNRS.

Junbo Zhang, Tianrui Li, Yan Yang, Lei Wang

### Case-Based Classifiers with Fuzzy Rough Sets

Fuzzy rough sets are widely studied and applied in the domain of machine learning and data mining these years. In this work, we design a case-based classifier with fuzzy rough set theory. The new classifier takes lower approximation of fuzzy rough sets as the theoretic foundation of selecting cases and case reasoning. Some numerical experiments are conducted to show the effectiveness of the proposed algorithm.

Shuang An, Qinghua Hu, Daren Yu

### Comparison of Greedy Algorithms for α-Decision Tree Construction

A comparison among different heuristics that are used by greedy algorithms which constructs approximate decision trees (

α

-decision trees) is presented. The comparison is conducted using decision tables based on 24 data sets from UCI Machine Learning Repository [2]. Complexity of decision trees is estimated relative to several cost functions: depth, average depth, number of nodes, number of nonterminal nodes, and number of terminal nodes. Costs of trees built by greedy algorithms are compared with minimum costs calculated by an algorithm based on dynamic programming. The results of experiments assign to each cost function a set of potentially good heuristics that minimize it.

Abdulaziz Alkhalid, Igor Chikalov, Mikhail Moshkov

### Constructing an Optimal Decision Tree for FAST Corner Point Detection

In this paper, we consider a problem that is originated in computer vision: determining an optimal testing strategy for the corner point detection problem that is a part of FAST algorithm [11,12]. The problem can be formulated as building a decision tree with the minimum average depth for a decision table with all discrete attributes. We experimentally compare performance of an exact algorithm based on dynamic programming and several greedy algorithms that differ in the attribute selection criterion.

Abdulaziz Alkhalid, Igor Chikalov, Mikhail Moshkov

### Incremental Learning in AttributeNets with Dynamic Reduct and IQuickReduct

Incremental learning is becoming more essential in the real world problems in which a decision system is being updated frequently. AttributeNets is a classifier whose representation allows updating the classifier when new data is added incrementally. In this paper the impact of reduct on the performance of AttributeNets as an Incremental Classifier is investigated. This philosophy has been demonstrated by adopting two varieties of reducts, namely dynamic reduct and IQuickReduct. These reducts were used to study the capability of AttributeNets for classification with reduced attributes.

P. S. V. S. Sai Prasad, K. Hima Bindu, C. Raghavendra Rao

### LEM2-Based Rule Induction from Data Tables with Imprecise Evaluations

We propose a rough set approach to data tables with imprecise evaluations. Treatments of imprecise evaluations are described and four rule induction schemas are proposed. For each rule induction schema, we apply LEM2-based rule induction algorithm by defining the positive object set appropriately. The performances of the proposed rule induction algorithms are examined by numerical experiments.

Masahiro Inuiguchi, Masahiko Tsuji, Yoshifumi Kusunoki, Masayo Tsurumi

### An Extension to Rough c-Means Clustering

The original form of the Rough

c

-means algorithm does not distinguish between data points in the boundary area. This paper presents an extended Rough

c

-means algorithm in which the distinction between data points in the boundary area is captured and used in the clustering procedure. Experimental results indicate that the proposed algorithm can yield more desirable clustering results in comparison to the original form of the Rough

c

-means algorithm.

Fan Li, Qihe Liu

### A Modified Cop-Kmeans Algorithm Based on Sequenced Cannot-Link Set

Clustering with instance-level constraints has received much attention in the clustering community recently. Particularly, must-Link and cannot-Link constraints between a given pair of instances in the data set are common prior knowledge incorporated in many clustering algorithms today. This approach has been shown to be successful in guiding a number of famous clustering algorithms towards more accurate results. However, recent work has also shown that incorporation of must-link and cannot-link constraints makes clustering algorithms too much sensitive to ”assignment order of instances” and therefore results in consequent constraint-violation. In this paper, we propose a modified version of Cop-Kmeans which relies on a sequenced assignment of cannot-linked instances. In comparison with original Cop-Kmeans, experiments on four UCI data sets indicate that our method could effectively overcome the problem of ”constraint-violation”, yet with almost the same performance as that of Cop-Kmeans algorithm.

Tonny Rutayisire, Yan Yang, Chao Lin, Jinyuan Zhang

### A NIS-Apriori Based Rule Generator in Prolog and Its Functionality for Table Data

Rough

Non

-

deterministic

Information

Analysis

(

RNIA

) is a rough set based framework for handling several kinds of incomplete information. In our previous research on

RNIA

, we gave definitions according to two modal concepts, the

certainty

and the

possibility

, and thoroughly investigated their mathematical properties. For rule generation in

RNIA

, we proposed

NIS

-

Apriori

algorithm, which is an extended

Apriori

algorithm. Our previous implementation of

NIS

-

Apriori

in

C

suffered from a lack of clarity caused by difficulties in expressing non-deterministic information by procedural languages. Therefore, we recently decided to improve the algorithm’s design and re-implement it in Prolog. This paper reports the current state of our algorithmic framework and outlines some new aspects of its functionality.

Hiroshi Sakai, Michinori Nakata, Dominik Ślęzak

### Towards a Practical Approach to Discover Internal Dependencies in Rule-Based Knowledge Bases

In this paper, we intend to introduce the conception of discovering the knowledge about rules saved in large rule-based knowledge bases, both generated automatically and acquired from human experts in the classical way. This paper presents a preliminary study of a new project in which we are going to join the two approaches: the hierarchical decomposition of large rule bases using cluster analysis and the decision units conception. Our goal is to discover useful, potentially implicit and directly unreadable information from large rule sets.

Roman Simiński, Agnieszka Nowak-Brzezińska, Tomasz Jach, Tomasz Xięski

### Discovering Patterns of Collaboration in Rough Set Research: Statistical and Graph-Theoretical Approach

In the paper we present some approach to analysis of large social networks and apply it to so called Pawlak collaboration graph. In order to build such graph we use data collected in the Rough Set Database System. Analyzing this data by means of own computer program, we discover some interesting patterns of collaboration among members of the rough set community represented by this graph.

Zbigniew Suraj, Piotr Grochowalski, Łukasz Lew

### Comparing a Clustering Density Criteria of Temporal Patterns of Terms Obtained by Different Feature Sets

In this paper, we present a comparison of similarity measures of temporal patterns of term usages in sets of documents. In order to find out remarkable trends in sets of documents, temporal text mining methods have been developed by combining text mining and temporal pattern extraction. However, most conventional methods are not treated temporal behaviors of the values of importance indices explicitly. By separating the calculation process of the importance indices and the temporal pattern extraction, we have developed a method for finding temporal patterns of term usages based on the importance indices. Then, we obtained two sets of temporal patterns from a set of bibliographical documents by using two sets of features; one is based on the values of each index, the other is the features of the linear trends based on linear regression. After obtaining the two sets of the temporal patterns, we compare a similarity measure between the terms included in each set of temporal patterns.

Hidenao Abe, Shusaku Tsumoto

### Similarity of Query Results in Similarity-Based Databases

We study estimations of similarity of query results in a similarity-based database model. The model results as extension of Codd’s model of data in which domains are additionally equipped with similarity relations and tuples in data tables have ranks indicating degrees to which tuples match similarity-based queries. We present ranked-based and tuple-based similarity of data tables, similarity closures, prove that relational operations in our model preserve similarity, and provide formulas for estimating similarity of query results based on similarity of input data. Most of the proofs are only sketched or omitted because of the limited scope of the paper.

Radim Belohlavek, Lucie Urbanova, Vilem Vychodil

### Rough Set Based Quality of Service Design for Service Provisioning in Clouds

Quality of Service (QoS) is a broad term used to describe the overall experience a user or application will receive over a network. A rough set based approach is used to design a modified Cloud-QoS Management Strategy (MC-QoSMS). MC-QoSMS is a component of cloud broker that is used to allocate resources based on Service Level Agreement between users and providers for Infrastructure as a Service (IaaS) provisioning of cloud. Concept of reduct from rough set theory is used to allocate the best service provider to the cloud’s user with minimum searching time. The performance of the proposed system has been analyzed in terms of number of requests. It is reported that the system outperformed random algorithm by 25% and the round robin algorithm by 30% for 100 requests.

Praveen Ganghishetti, Rajeev Wankar, Rafah M. Almuttairi, C. Raghavendra Rao

### GTrust: A Distributed Trust Model in Multi-Agent Systems Based on Grey System Theory

Trust model is a key problem in Multi-Agent System. In this paper, a distributed trust model called GTrust is constructed to help agents choose partners in their interactions. The grey system theory, which is used to deal with uncertainty problems, is introduced into this model. A method is designed to rate the witness and also a grey fixed weight clustering approach is proposed to determine the trust to the witnesses. Simulation experiments show that the GTrust can achieve better results in either static environment, or dynamic environment.

Lijian He, Houkuan Huang, Xingye Dong

### Linear Necessity Measures and Their Applications to Possibilistic Linear Programming

In this paper, necessity measures defined by two linear modifier functions, which we call “Linear Necessity Measure”, are studied. We investigate implication functions corresponding to linear necessity measures and apply linear necessity measures to possibilistic linear programming problems. We describe implication functions generated from two linear modifier generating functions, and classify them. We show that necessity measure constrained models with linear necessity measures of fuzzy linear programming problems are solved easily in many cases.

Masahiro Inuiguchi, Tatsuya Higuchi, Masayo Tsurumi

### Remarks on Pairwise Comparison Numerical and Non-numerical Rankings

A relationship between pairwise comparison classical numerical ranking and a partial order based pairwise comparison non-numerical ranking is discussed. A consistent non-numerical ranking that is equivalent to a given consistent numerical ranking is constructed and the correctness of this construction is proven. A new concept of consistency for the pairwise comparison classical numerical ranking is given.

Ryszard Janicki, Yun Zhai

### Community-Based Relational Markov Networks in Complex Networks

Relational Markov networks (RMNs) are a joint probabilistic model for an entire collection of related entities. The model is able to mine relational data effectively by integrating information from content attributes of individual entities as well as the links among them, yet the prediction accuracy is greatly affected by the definition of the relational clique templates. Maximum likelihood estimation (MLE) is used to estimate the model parameters, but this can be quite costly because multiple rounds of approximate inference are required over the entire dataset. In this paper, we propose constructing RMNs basing on the community structures of complex networks, and present a discriminative maximum pseudolikelihood estimation (DMPLE) approach for training RMNs. Experiments on the collective classification and link prediction tasks on some real-world datasets show that our approaches perform well in terms of accuracy and efficiency.

Huaiyu Wan, Youfang Lin, Caiyan Jia, Houkuan Huang

### Applying Multi-Criteria Decision Analysis to Global Software Development with Scrum Project Planning

Distributed Software Development (DSD) projects have become a common reality for many organizations. Scrum is a consolidated Agile methodology and has been increasingly used in a distributed fashion. As distance makes difficult to interact and to cooperate effectively, it is paramount to use methodologies like Scrum that emphasizes communication, reduces coordination and control overhead. Successfully planning and managing the combined use of DSD and Scrum is a complex task and requires carefully planning. Despite the importance and complexity of this type of problem, there seems to be a lack of reports, in the literature, of models that could support managers dealing with such decision context. This paper applies a multi-criteria decision model on the choice of DSD Scrum project plans that have a better chance of success. The model, presented in [1], was developed using cognitive mapping and MACBETH[2]. The application of the model is demonstrated, followed by conclusion and future work.

Luis Henrique Almeida, Plácido Rogério Pinheiro, Adriano Bessa Albuquerque

### Accuracy Evaluation of the System of Type 1 Diabetes Prediction

Based on genetic data we created the decision support system to classify and later on to predict the illness among the children with genetic susceptibility to DMT1. The system can recommend including a person to pre-diabetes therapy. While creating the system the classification problems appeared. Some of the algorithms based on the rough set theory have been applied to improve the classification accuracy.

Rafał Deja

### Driver Status Recognition by Neighborhood Covering Rules

Driver fatigue recognition based on computer vision is considered as a challenging issue. Though human face carries most information related to human status, the information is redundant and overlapped. In this work, we concentrate on several fatigue indicating areas and three types of features are extracted from them. Then a neighborhood rough set technique is introduced to evaluate quality of candidate features and select the effective subset. A rule learning classifier based on neighborhood covering reduction is employed for the classification task. Compared with classic classifiers, the designed recognition system performs well. The experiments are presented to show the effectiveness of the proposed technique.

Yong Du, Qinghua Hu, Peijun Ma, Xiaohong Su

### Application of Gravitational Search Algorithm on Data Clustering

Data clustering, the process of grouping similar objects in a set of observations is one of the attractive and main tasks in data mining that is used in many areas and applications such as text clustering and information retrieval, data compaction, fraud detection, biology, computer vision, data summarization, marketing and customer analysis, etc. The well-known k-means algorithm, which widely applied to the clustering problem, has the drawbacks of depending on the initial state of centroids and may converge to the local optima rather than global optima. A data clustering algorithm based on the gravitational search algorithm (GSA) is proposed in this research. In this algorithm, some candidate solutions for clustering problem are created randomly and then interact with one another via Newton’s gravity law to search the problem space. The performance of the presented algorithm is compared with three other well-known clustering algorithms, including k-means, genetic algorithm (GA), and particle swarm optimization algorithm (PSO) on four real and standard datasets. Experimental results confirm that the GSA is a robust and viable method for data clustering.

Abdolreza Hatamlou, Salwani Abdullah, Hossein Nezamabadi-pour

### Application of Rough Sets in GIS Generalization

This paper proposes a method to solve the selection problem in GIS generalization by leveraging the rough sets theory for attribute reduction. In specific, by taking into account the special characteristics of the GIS spatial data, our method can be outlined as follows. First, we discretize the continuous-valued attributes through unsupervised discretization method; Second, we classify in a fuzzy manner the spatial objects, whose result will then serve as the decisional attributes; Third, we evaluate the respective importance of these attributes through the attribute reduction method borrowed from the rough sets theory and consequently we conduct a dynamic sorting according to the resulting importance values. Through experimentation results, the effectiveness performance of our proposed method is validated.

Wenjing Li, Jia Qiu, Zhaocong Wu, Zhiyong Lin, Shaoning Li

### Application of Rough Set Theory for Evaluating Polysaccharides Extraction

This paper reports an application of rough set theory for evaluating polysaccharides extraction from apple pomace. The importance of factors affecting polysaccharides yield is analyzed by the means of attribute reduction. The significances of four factors are obtained. It is found that extraction time and ratio of solution to sample are the dominant factors. The results show that rough set theory can effectively analyze and evaluate factors influencing polysaccharides yield from apple pomace and can be used in the analysis of extraction of functional components in foods.

Shuang Liu, Lijun Sun, Yurong Guo, Jialin Gao, Lei Liu

### Identification of Print Technology Based on Homogeneous Regions of Image

Tampering of documents is increasing monotonically by posing challenges to forensic scientist. There is a great need to develop alternative solutions for forensic characterization of printers. This paper analyses documents printed by various printers and characterizes them for identification purposes. Present study focuses on developing a method that characterizes print technology based on spatial variability. Homogeneous colour region of images are taken as sample and the data generated is taken as input to Reduct based Decision Tree (RDT), which gives rules to identify the source printer for the given test data. This paper provides GVMPT algorithm for print technology identification and it also demonstrates the influence of the choice of parameters in capturing the print technology of document based on image region.

Umadevi Maramreddi, Arun Agarwal, C. Raghavendra Rao

### Ant Based Clustering of MMPI Data - An Experimental Study

Our research concerns psychometric data coming from the Minnesota Multiphasic Personality Inventory (MMPI) test. MMPI is one of the most frequently used in clinical mental health personality tests as well as psychopathology (mental and behavioral disorders). We are developing the Copernicus system. It is a tool for the analysis of MMPI profiles of patients with mental disorders. In this system, there have been selected and implemented different quantitative groups of methods useful for differential interprofile diagnosis. In the paper, we investigate clustering of MMPI data using one of nature-inspired heuristic approaches - ant based clustering. For this approach, we test usefulness of different dissimilarity measures of MMPI profiles both standard and those defined in the professional domain literature.

Krzysztof Pancerz, Arkadiusz Lewicki, Ryszard Tadeusiewicz, Jerzy Gomuła

### Detection of Cancer Patients Using an Innovative Method for Learning at Imbalanced Datasets

Most of standard learning algorithms presume or at least expect that distributions governed on the different classes of dataset are balanced. Also they presume that the misclassification cost of each data point is equal without considering its class. These algorithms fail to learn at the imbalanced datasets. Cancer detection is a well-known domain in which it is very common to face imbalanced class distributions. This paper presents an algorithm which is suit to this field, in both speed and efficacy. The experimental results show that the performance of the proposed algorithm outperforms some of the best methods in the field.

Hamid Parvin, Behrouz Minaei-Bidgoli, Hosein Alizadeh

### Information Reuse in Hospital Information Systems: A Similarity-Oriented Data Mining Approach

This paper presents application of data mining to data stored in a hospital information system in which temporal behavior of global hospital activities are visualized. The results show that the reuse of stored data will give a powerful tool for hospital management and lead to improvement of hospital services.

Shusaku Tsumoto, Shoji Hirano

### A Model-Based Decision Support Tool Using Fuzzy Optimization for Climate Change

This paper proposes a model-based decision support tool using fuzzy optimization for assessing regional sustainability development (RSD) under climate change. The proposed tool integrates fuzzy goal programming (FGP) model with a fuzzy analytic hierarchy process (FAHP) to determine the optimal policy for RSD under climate change in the agriculture sector. The FAHP is used to handle the non-cost criteria, aspiration levels and to find the relative weights of multiple conflicting objectives in FGP model by introducing a linguistic variable that model the decision maker’s preferences. The proposed tool is capable for providing different alternative policies based on the degree of uncertainty. The proposed FGP-FAHP is able to allow the decision makers to test different sustainability development policies at deferent levels such as regional, sub-regions, goals/indicators and sub-goals.

Omar S. Soliman, Aboul Ella Hassanien, Neveen I. Ghali, Nashwa El-Bendary, Ruhul A. Sarker

### Clustering of Rough Set Related Documents with Use of Knowledge from DBpedia

A case study of semantic clustering of scientific articles related to Rough Sets is presented. The proposed method groups the documents on the basis of their content and with assistance of DBpedia knowledge base. The text corpus is first treated with Natural Language Processing tools in order to produce vector representations of the content and then matched against a collection of concepts retrieved from DBpedia. As a result, a new representation is constructed that better reflects the semantics of the texts. With this new representation, the documents are hierarchically clustered in order to form partition of papers that share semantic relatedness. The steps in textual data preparation, utilization of DBpedia and clustering are explained and illustrated with results of experiments performed on a corpus of scientific documents about rough sets.

Marcin Szczuka, Andrzej Janusz, Kamil Herba

### Case-Based Reasoning Using Dominance-Based Decision Rules

Case-based Reasoning (CBR) is a process of inferring conclusions related to a new situation by the analysis of similar cases known from the past experience. We propose to adopt in this process the Dominance-based Rough Set Approach (DRSA), that is able to handle monotonicity relationships of the type “the more similar is object

y

to object

x

with respect to the considered features, the closer is

y

to

x

in terms of the membership to a given fuzzy set

X

”. At the level of marginal similarity concerning single features, we consider this similarity in ordinal terms only. The marginal similarities are aggregated within decision rules underlying the general monotonicity property of comprehensive closeness of objects with respect to their marginal similarities.

Marcin Szeląg, Salvatore Greco, Jerzy Błaszczyński, Roman Słowiński

### RoSetOn: The Open Project for Ontology of Rough Sets and Related Fields

In the paper, we initialize the open project for ontology of rough sets and related fields called RoSetOn. In our opinion, there is a need in rough set society to create such ontology. This paper tries to open discussion and actions concerning this task and the first step was made. At the beginning, we assume formal mathematical description of the ontology using a graph structure. Graphs are a powerful and universal tool widely used in information processing. In the paper, we present a scheme of the proposed ontology and one of possible applications of it.

Zbigniew Suraj, Piotr Grochowalski

### Fuzzy Description of Air Quality: A Case Study

The methods suggested by US -EPA for computation of air quality indices do not include expert’s knowledge. There exists aleotary uncertainty in the pollution parametric data and the epistemic uncertainty in describing the pollutants by the domain experts in linguistic terms such as good, very good, etc. Fuzzy logic based formalism presented in this paper can model two types of uncertainties, and finally straightway describe air quality in linguistic terms with a degree of certainty attached to each term.

The case study relates to the assessment of the status of ambient air quality in Pune city at the defined air quality monitoring stations, using fuzzy logic based formalism. The comparison of the results obtained using the conventional method of computing air quality index and the proposed fuzzy logic based method is an integral part of the paper.

Jyoti Y. Yadav, Vilas Kharat, Ashok Deshpande

### A Robust Face Recognition Method Based on AdaBoost, EHMM and Sample Perturbation

Face recognition is a classical topic in pattern classification, although there are already some good methods and applications, robust face recognition methods are always pursued. In this paper, based on AdaBoost, embedded hidden Markov model(EHMM), and sample perturbation, a novel and robust face recognition method is proposed. Experiments results show that the proposed method can get higher recognition rate on benchmark datasets. Furthermore, the proposed method show robustness on the test samples with different illumination and shelter.

Yong Yang, Kan Tian, Zhengrong Chen

### Roughness Approach to Color Image Segmentation through Smoothing Local Difference

Aiming at the problems of histogram-based thresholding, rough set theory is applied to construct the roughness measure for segmenting color image. However, the extant roughness measure is a qualitative description of neighborhood similarity and tends to over focus on the trivial homogeneity. An improved roughness measure is proposed in this paper. The novel roughness is computed from smoothed local differences and quantified homogeneity, thus can form the accurate representation of homogeneous regions. The experimental results indicate that the segmentation based on improved roughness has good performances on most testing images.

Xiaodong Yue, Duoqian Miao, Yufei Chen, Hongzhong Chen

### On Local Inclusion Degree of Intuitionistic Fuzzy Sets

This paper mainly introduce the definition of local inclusion degree of IF sets in terms of the elementwise values, which is considered a locally determined inclusion degree. We then investigate a series of properties of the local inclusion degree. Furthermore, the measure of similarity between intuitionistic fuzzy sets is discussed on the basis of local inclusion degree.

Lei Zhou

### Analysis of Data-Driven Parameters in Game-Theoretic Rough Sets

The game-theoretic rough set (GTRS) model provides an alternative approach to the derivation of probabilistic rough set regions. Whereas other models rely on either user-provided parameters or notions of cost for the date set in question, the GTRS model learns these parameters through a game-theoretic process. The parameters can be of the form of probabilities that determine the rough set region bounds or they can be superseded by classification measures whose values represent the current health of the classification system. In this article, we will be analyzing the relationship between the calculated parameters and the learned values of the loss functions. We demonstrate the effectiveness of the game-theoretic rough set model in performing data analysis.

Joseph P. Herbert, JingTao Yao

### An Optimization Viewpoint of Decision-Theoretic Rough Set Model

This paper considers an optimization viewpoint of decision-theoretic rough set model. An optimization problem is proposed by considering the minimization of the decision cost. Based on the optimization problem, cost functions and thresholds used in decision-theoretic rough set model can be learned from the given data automatically. An adaptive learning algorithm Alcofa is proposed. Another significant inference drawn from the solution of the optimization problem is a minimum cost based attribute reduction. The attribute reduction can be interpreted as finding the minimal attribute set to make the decision cost minimum. The optimization viewpoint can bring some new insights into the research on decision-theoretic rough set model.

Xiuyi Jia, Weiwei Li, Lin Shang, Jiajun Chen

### Attribute Reduction in Decision-Theoretic Rough Set Model: A Further Investigation

The monotonicity of positive region in PRS (Pawlak Rough Set) and DTRS (Decision-Theoretic Rough Set) are comparatively discussed in this paper. Theoretic analysis shows that the positive region in DTRS model may expand with the decrease of the attributes, which is essentially different from that of PRS model and leads to a new definition of attribute reduction in DTRS model. A heuristic algorithm for the newly defined attribute reduction in DTRS model is proposed, in which the positive region is allowed to expand instead of remaining unchanged in the process of deleting attributes. Results of experimental analysis are included to validate the theoretic analysis and quantify the effectiveness of the proposed attribute reduction algorithm.

Huaxiong Li, Xianzhong Zhou, Jiabao Zhao, Dun Liu

### A New Discriminant Analysis Approach under Decision-Theoretic Rough Sets

Discriminant analysis is an effective methodology to deal with the classification problem. However, most common methods including binary logistic regression in discriminant analysis rarely consider the semantics explanations such as losses or costs in decision rules. From the idea of three-way decisions in decision-theoretic rough sets (DTRS), we propose a new discriminant analysis approach by combining DTRS and binary logistic regression. DTRS is utilized to systematically calculate the corresponding thresholds with Bayesian decision procedure. Meanwhile, the binary logistic regression is employed to compute the conditional probability of three-way decisions. An empirical study validates the reasonability and effectiveness of the proposed approach.

Dun Liu, Tianrui Li, Decui Liang

### Construction of α-Decision Trees for Tables with Many-Valued Decisions

The paper is devoted to the study of greedy algorithm for construction of approximate decision trees (

α

-decision trees). This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. We consider bound on the number of algorithm steps, and bound on the algorithm accuracy relative to the depth of decision trees.

Mikhail Moshkov, Beata Zielosko

### Decision Making in Incomplete Information System Based on Decision-Theoretic Rough Sets

In complete information system, the universe is partitioned with equivalence relation. Given a concept we get a pair of approximations of the concept using rough set theory. An incomplete information table can be expressed as a family of complete information tables. The universe is partitioned by equivalence relation for each of complete information table. The probability of each object in universe belonging to the concept can be calculated. A decision rule is derived using the probability instead of conditional probability in decision-theoretic rough sets. At last, the unverse is divided into three regions according to the probability.

Xiaoping Yang, Haiguang Song, Tong-Jun Li

### Automatically Determining the Number of Clusters Using Decision-Theoretic Rough Set

Clustering provides a common means of identifying structure in complex data, and there is renewed interest in clustering as a tool for the analysis of large data sets in many fields. A fundamental and difficult problem in cluster analysis is how many clusters are appropriate for the description of a given system. The objective of this paper is to develop a method for automatically determining the number of clusters. The method firstly proposes a new clustering validity evaluation function based on the extended decision-theoretic rough set model. Then a hierarchical clustering algorithm is proposed and some conclusions are obtained in the validation of the algorithm. Experimental results show that the new clustering method can stop at the perfect number of clusters automatically and validate the change laws of the clustering validity evaluation function.

Hong Yu, Zhanguo Liu, Guoyin Wang

### A New Formulation of Multi-category Decision-Theoretic Rough Sets

As a natural extension to the rough set approximations with two decision classes, this paper provides a new formulation of multi-category decision-theoretic rough sets. A three-way decision is added to each class which gives the user the flexibility of making a deferred decision. Different misclassification errors are treated separately with the notion of loss functions from Bayesian decision theory. The losses incurred for making deferred and rejective decisions to each class are also considered. The main objective of this paper is to tackle the limitations of the previous related work, and therefore provide a more complete solution to multi-category decision making.

Bing Zhou

### Parallel Computation in Finding Near Neighbourhoods

The problem considered in this article stems from the observation that practical applications of near set theory requires efficient determination of all the tolerance classes containing objects from the union of two disjoints sets. Near set theory consists in extracting perceptually relevant information from groups of objects based on their descriptions. Tolerance classes are sets where all the pairs of objects within a set must satisfy the tolerance relation and the set is maximal with respect to inclusion. Finding such classes is a computationally complex problem, especially in the case of large data sets or sets of objects with similar features. The contribution of this article is a parallelized algorithm for finding tolerance classes using NVIDIA’s Compute Unified Device Architecture (CUDA). The parallelized algorithm is illustrated in terms of a content-based image retrieval application.

Christopher J. Henry, Sheela Ramanna

### ε-Near Collections

This paper considers the problem of how approach merotopic spaces can be used in the study of

ε

-near collections of subsets. The solution to this problem, in general, stems from M. Katĕtov’s observation that merotopic spaces are obtained by topologising certain parts of a nonempty set. In particular, a consideration of feature-based merotopic distance between collections of subsets in the context of approach spaces provides a means of quantifying the

ε

-nearness of collections within some

ε

∈ (0, ∞ ]. Using the proposed approach to measuring the nearness of collections, one can consider the merotopic distance between collections of fuzzy sets, or rough sets, or, for that matter, between collections of near sets, themselves. This opens the door to research concerning the degree of nearness of ’populations’ of different types of sets. The contribution of this paper is an application of merotopies with two arguments in measuring the nearness of collections of open neighbourhoods in digital images.

James F. Peters, Maciej Borkowski

### Nearness of Subtly Different Digital Images

The problem considered in this article is how to measure the nearness or apartness of digital images in cases where it is important to detect subtle changes in the contour, position, and spatial orientation of bounded regions. The solution of this problem results from an application of anisotropic (direction dependent) wavelets and a tolerance near set approach to detecting similarities in pairs of images. A wavelet-based tolerance Nearness Measure (tNM) makes it possible to measure fine-grained differences in shapes in pairs of images. The application of the proposed method focuses on image sequences extracted from hand-finger movement videos. Each image sequence consists of hand-finger movements recorded during rehabilitation exercises. The nearness of pairs of images from such sequences is measured to check the extent that normal hand-finger movement differs from arthritic hand-finger movement. Experimental results of the proposed approach are reported, here. The contribution of this article is an application of an anisotropic wavelet-based tNM in classifying arthritic hand-finger movement images in terms of their degree of nearness to or apartness from normal hand-finger movement images.

Leszek Puzio, James F. Peters

### A Generalization of Near Set Model

In the classical near set model, the probe functions described by a single assertion (feature) subset, which are established through a single granulation from the view of granular computing. In this work, we aim to extend Peters’s near set model to a multi-granulation near set model, in which the probe functions are some expression of the set of assertions (features) logically compounded under logic operation “

or

”. Moreover, near lower and upper approximations are defined by using the neighborhoods based on AFS description logics.

Lidong Wang, Xiaodong Liu, Xiaojuan Tian

### Gauges, Pregauges and Completions: Some Theoretical Aspects of Near and Rough Set Approaches to Data

The theory of near sets and the theory of rough sets share a common metric root. Since a probe function and an equivalence relation can be regarded as a pseudometric on

U

, in actual fact the underlying structure of both theories is a family of pseudometrics. The same starting point one can find in metric topology (e.g. [2]): an arbitrary family of pseudometrics is called a

pregauge structure

and when this family additionally separates all points, it is called a

gauge structure

. Pregauge structures characterise all completely regular spaces, whereas gauge structures correspond to all Hausdorff completely regular spaces (often called

gauge spaces

). In consequence, a perceptual system and an information system can be regarded as both pregauge structures and as topological completely regular spaces. In the paper, we would like to make a step towards gauge structures. A perceptual system or an approximation space does usually not separate all points, therefore we introduce the concept of a

completion

of a pregauge, but on the relational level: that is, a pregauge is regarded as a family

$\mathcal{E}$

of equivalence relations, and a completion of

$\mathcal{E}$

is a relation

R

$\mathcal{E}$

, makes this family separate all points of

U

. If

R

is an equivalence relation, then

$\mathcal{E}\cup R$

can be converted into a gauge structure and the corresponding topology will be Hausdorff completely regular. Since, in data analysis,

U

is finite, this topology will be discrete. Therefore, our aim is actually to find weaker topologies than gauge spaces. To this end, we allow

R

to be a tolerance relation and we consider topologies on the pregauge structure and on its completion, separately. At the end we present a simple application of these topologies.

Marcin Wolski

### Path Queries on Massive Graphs Based on Multi-granular Graph Partitioning

The quotient space theory can represent the world at different granularities and deal with complicated problems hierarchically. In this paper, we introduce a method for finding shortest path on massive graphs based on multi-granular graph partitioning. Path queries into two parts: preprocessing and query step. In preprocessing, we introduce a heuristic local community structure discovery algorithm to decompose massive graphs into some local communities and construct a sequence of hierarchical quotient spaces to describe hierarchy structure on massive graphs. In query, we improve evaluation function of heuristic search method in path query. The implementation works on massive networks. From experimental results, it can be stated that proposed algorithm is effective and efficient in transportation road networks of US.

Fu-gui He, Yan-ping Zhang, Jie Chen, Ling Zhang

### A Minimal Test Suite Generation Method Based on Quotient Space Theory

The cost and effectiveness of software testing is determined by the quantity and quality of the test suite. In order to select the most efficient and least subset from the test suite, a minimal test suite generation method based on quotient space theory is proposed. Based on the testing requirements, decomposition method and synthesis technology of the quotient space theory are applied to give a partition of the test suite, and a set of effective algorithm of the minimal test suite generation is concluded.

Lei Wu, Longshu Li

### Audio Signal Blind Deconvolution Based on the Quotient Space Hierarchical Theory

The Time-Frequency Domain Blind Deconvolution is discussed based on the quotient space theory. The domain transformation [X

w

] → [X

n

]

G

→ [X

] and the corresponding granular computing method is introduced to describe the hierarchical structure of deconvolution process, which converts the convolutive mixture of original time-domain signals into instantaneous mixtures in the quotient space. The experimental results show that the algorithm proposed in this paper achieves a relatively high separation quality.

Chao Zhang, Yuan Zhang, Xiao-pei Wu

### The Optimal Approximation of Fuzzy Tolerance Relation

A problem of the optimal approximation to a fuzzy tolerance relation is discussed in this paper. Specifically, our aim is to get the optimal fuzzy equivalence relation from a given fuzzy tolerance relation. Firstly, we discuss the relationship between the covering and the partition of a set. Then, we give the concept of distance between coverings (or partitions) and put forward the algorithms to get the optimal partition from a given covering. Main results include: 1) give the sufficient and necessary conditions for the optimal approximation of coverings in union sets; 2) give the necessary condition for absolutely optimal approximation of coverings; 3) based on the optimal approximation of each covering in the covering chain, the optimal fuzzy equivalence relation is obtained from the given fuzzy tolerance relation.

Ling Zhang, Yan-ping Zhang, Shu Zhao

### A New Method for Inconsistent Multicriteria Classification

Three relaxation models (VC-DRSA, VP-DRSA and ISVP-DRSA) of DRSA have been proposed to relax the strict dominance principle. However, the classification performance of these models is affected by the value of consistency level

l

. Until now, the value of

l

is set according to prior domain knowledge. But no one knows which value is the best and the reason. To address the multicriteria classification problem, we propose a new method in this paper. A new uncertainty measure is defined and an algorithm for transforming inconsistent preference-ordered systems into consistent ones (TIPStoC) is designed in this paper. An iterative approach is adopted in TIPStoC algorithm. We find that inconsistent preference-ordered information systems can be transformed into consistent systems with low computation complexity, and without losing useful information. The classification performance will be improved with the decision rules induced from the consistent systems. Besides, the value of consistency level

l

is set to 1.0 without depending on prior knowledge. Finally, the procedure of TIPStoC algorithm is illustrated by a real example and the efficiency of the new method is proved by experiments.

Weibin Deng, Guoyin Wang, Shuangxia Yang, Feng Hu

### Probabilistic Similarity-Based Reduct

The attribute selection problem with respect to decision tables can be efficiently solved with the use of rough set theory. However, a known issue in standard rough set methodology is its inability to deal with probabilistic and similarity information about objects. This paper presents a novel type of reduct that takes into account this information. We argue that the approximate preservation of probability distributions and similarity of objects within reduced decision table helps to preserve the quality of its classification capability.

Wojciech Froelich, Alicja Wakulicz-Deja

### Inference Processes in Decision Support Systems with Incomplete Knowledge

Authors propose a new approach in the optimization of inference processes in decision support systems with incomplete knowledge. The idea is based on clustering large set of rules from knowledge bases as long as it is necessary to find a relevant rule as quickly as possible. This work is highly focused on the results of experiments regarding the influence of Agnes’ algorithm parameters on the quality of the clustering process. Additionally, the authors present the results of the experiments regarding the optimal amount of groups formed by decision rules.

Alicja Wakulicz-Deja, Agnieszka Nowak-Brzezińska, Tomasz Jach

### Synthesis of Synchronized Concurrent Systems Specified by Information Systems

We present some applications of a method for synthesis of concurrent models from the specification given by an information system and/or decision rules. A solution generated by the method for the well known problem in concurrency is represented by colored Petri nets (

CP

-nets, in short). The method allows to generate automatically an appropriate

CP

-net from the specification given by an information system and/or decision rules. This kind of specification should be more convenient for the designers of concurrent systems than directly drawing a Petri net, especially in cases of large in size net models. The

CP

-nets produced automatically by the application of the method are sometimes not in the optimal form and some modification procedure could be applied to get more optimal solutions. This problem is also discussed in the paper.

Zbigniew Suraj, Krzysztof Pancerz

### Efficiency of Complex Data Clustering

This work is focused on the matter of clustering complex data using the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm and searching through such a structure. It presents related problems, focusing primarily on the aspect of choosing the initial parameters of the density based algorithm, as well as various ways of creating valid cluster representatives. What is more, the paper emphasizes the importance of the domain knowledge, as a factor which has a huge impact on the quality of the clustering. Carried out experiments allow to compare the efficiency of finding clusters relevant to the given question, depending on the way of how the cluster representatives were created.

Alicja Wakulicz-Deja, Agnieszka Nowak-Brzezińska, Tomasz Xięski

### The Extraction Method of DNA Microarray Features Based on Experimental A Statistics

The DNA microarray exploration topic is a really important area of research. Comparing samples of tissues we can find genes, which are characteristic of particular research problems. A number of researchers involved in bioinformatics are attempting to find effective gene extraction methods and classifiers, in order to predict particular medical problems. Even if we do not consider the ontological sense of genes, it is possible by information technology methods to find genes which are the most significant for a given research problem. An exemplary application of DNA microarrays can be the ability to detect some illnesses, personal identification, or distinguishing features of some organisms. In this work we describe our most effective (in the global sense) gene extraction method based on experimental

A

statistics, called SAM5. Next, we use the granular classifier based on weighed voting, which proved the best among those recently studied by Polkowski, and Artiemjew - 8_

v

1_

w

4 algorithm.

Piotr Artiemjew

### Granular Structures in Graphs

The granular structures emphasize a multilevel and multiview understanding of problems. This paper focuses on a study of how to granulate a graph, and how to extract the granular structures in the graph. The granular structures can be seen as an abstract, summary or epitome of the graph. Two granular structures extraction models are proposed, one is based on the degree of the vertex, the other is based on the weight of the edge. Each model is a multilevel representation, and the models integrated together presents a multiview comprehension of the graph.

Guang Chen, Ning Zhong

### Fuzzy Rough Granular Self Organizing Map

A fuzzy rough granular self organizing map (FRGSOM) is proposed for clustering patterns from overlapping regions using competitive learning of the Kohonen’s self organizing map. The development strategy of the FRGSOM is mainly based on granular input vector and initial connection weights. The input vector is described in terms of fuzzy granules

low

,

medium

or

high

, and the number of granulation structures depends on the number of classes present in the data. Each structure is developed by a user defined

α

-value, labeled according to class information, and presented to a decision system. This decision system is used to extract domain knowledge in the form of dependency factors using fuzzy rough sets. These factors are assigned as the initial connection weights of the proposed FRGSOM, and then the network is trained through competitive learning. The effectiveness of the FRGSOM is shown on different real life data sets.

Avatharam Ganivada, Shubhra Sankar Ray, Sankar Kumar Pal

### Knowledge Acquisition in Inconsistent Multi-scale Decision Systems

The key to granular computing is to make use of granules in problem solving. However, there are different granules at different levels of scale in data sets having hierarchical scale structures. Therefore, the concept of multi-scale decision systems is introduced in this paper, and a formal approach to knowledge acquisition measured at different levels of granulations is also proposed, and some algorithms for knowledge acquisition in inconsistent multi-scale decision systems are proposed with illustrative examples.

Shen-Ming Gu, Wei-Zhi Wu

### Text Clustering Based on Granular Computing and Wikipedia

Text clustering plays an important role in many real-world applications, but it is faced with various challenges, such as, curse of dimensionality, complex semantics and large volume. A lot of researches paid attention to deal with such problems by designing new text representation models and clustering algorithms. However, text clustering still remains a research problem due to the complicated properties of text data. In this paper, a text clustering procedure is proposed based on the principle of granular computing with the aid of Wikipedia. The proposed clustering method firstly identifies the text granules, especially focusing on concepts and words with the aid of Wikipedia. And then, it mines the latent patterns based on the computation of such granules. Experimental results on benchmark data sets (20Newsgroups and Reuters-21578) have shown that the proposed method improves the performance of text clustering by comparing with the existing clustering algorithm together with the existing representation models.

Liping Jing, Jian Yu

### Rough Relations, Neighborhood Relations, and Granular Computing

Based on formal relationships in Pansystems, this paper discusses the notion of indistinguishability relations among some order pairs in a binary relation and introduces rough relations. A rough relation is a generalization of a rough set from a new multi-dimensional viewpoint. In an exponent poset, an indistinguishability relation can be interpreted as a neighborhood relation. By treating a neighborhood relation closure as a universe, the notion of rough degree can be introduced. An inherent relation of granular computing is interpretated based on rough relations.

He Lin, Yao Zhou

### Comparing Clustering Schemes at Two Levels of Granularity for Mobile Call Mining

Datasets in many applications can be viewed at different levels of granularity. Depending on the level of granularity, data mining techniques can produce different results. Correlating results from different levels of granularity can improve the quality of analysis. This paper proposes a process and measures for comparing clustering results from two levels of granularity for a mobile call dataset. The clustering is applied to the phone calls as well as phone numbers, where phone calls are finer granules while phone numbers are coarser granules. The coarse granular clustering is then expanded to a finer level and finer granular clustering is contracted to the coarser granularity for additional qualitative analysis. The paper uses a popular cluster quality measure called Davies-Bouldin index as well as a proposal for transforming clustering schemes between different levels of granularity.

Pawan Lingras, Parag Bhalchandra, Satish Mekewad, Ravindra Rathod, Santosh Khamitkar

### Granular-Based Partial Periodic Pattern Discovery over Time Series Data

Periodicity analysis of the time series is getting more and more significant. There are many contributions for periodic pattern discovery, however, few laid emphasis on the further usage. In the paper, we propose a granular-based partial periodic pattern detecting method over time series data. This method can detect all patterns of every possible periodicity without any prior knowledge of the data sets, by setting different granularity and minimum support threshold. The results that it learned can be used in outlier or change point detection in time series data analysis. The experiment results show its effectiveness.

Aibao Luo, Xiuyi Jia, Lin Shang, Yang Gao, Yubin Yang

### Approximations of Functions: Toward Rough Granular Calculus

In our previous papers, we investigated rough approximations of functions. These approximations are different than considered by Professor Zdzisław Pawlak [13,14]. In particular, our approach allows us to overcome the drawback of the exiting approach that the lower approximations of functions are usually empty. In this paper, we emphasize a possible application of boolean reasoning in inducing of function approximation. This paper is a step toward developing rough calculus and rough granular calculus as an extension of the approach presented in [4,14].

Andrzej Skowron, Jarosław Stepaniuk

### Bipartite Graphs and Coverings

In many real world applications, data are organized by coverings, instead of partitions. Covering-based rough sets have been proposed to cope with this type of data. Covering-based rough set theory is more general than rough set theory, then there is a need to employ sophisticated theories to make it more adaptive to applications. Covering is one of core concepts in covering-based rough sets, and it is urgent to connect coverings with other data models. This paper establishes the relationship between coverings and bipartite graphs. Through its index set, a covering induces some isomorphic bipartite graphs. Conversely, a bipartite graph induces a covering of the set of vertices. These inductions are converse with each other.

Shiping Wang, William Zhu, Fan Min

### Covering-Based Reduction of Object-Oriented Concept Lattices

To be an efficient tool for knowledge expression and analysis, formal concept analysis has been paid more attention to in recent years. For the object-oriented concept lattice of a formal context, this paper proposes two compression methods to reduce the original lattice based on covering of the object (attribute) set. We firstly introduce the similarity degree and the neighborhood of an object (attribute) set, and then obtain the covering of the object (attribute) set. Based on which, the object-oriented concept lattice can be compressed into a smaller one through adjusting the object (attribute) neighborhood’s size by the similarity degree, and the reduced lattice is a subset of the original one.

Ling Wei, Qiang Li

### Top-Down Progressive Computing

A top-down, step-wise progressive computing model is presented as a mode of granular computing. Based on a multilevel granular structure, progressive computing explores a sequence of refinements from coarser information granulation to finer information granulation. A basic progressive computing algorithm is introduced. Examples of progressive computing are provided.

Yiyu Yao, Jigang Luo

### Least Absolute Deviation Cut

The paper discussed a new additive extension of minimum cut by simultaneously minimizing intra cluster similarity bias and inter cluster similarity, Least Absolute Deviation Cut (LAD cut). The LAD cut can be proved convergent in finite iterative steps, and its theoretical conditions that the LAD cut can work well is also presented. Furthermore, its computational complexity is also analyzed. Numerical experimental results show that LAD cut may be useful for image segmentation.

Jian Yu, Liping Jing

### Hierarchical Qualitative Inference Model with Substructures

Qualitative propagation influences in qualitative inferences are unlike and interrelated on the different hierarchy of knowledge granules, and quantitative information loss easily results in reasoning conflicts. This paper presents a hierarchical qualitative inference model with substructures which to some extent can eliminate the qualitative impact of uncertainty and solve trade-off problems by metastructures with basic decomposition and coarse-grained mesoscale substructures with edge-deletion. The substructural inferences could not only reduce computational complexity, but provide an approximate strategy for modular reasoning on large-scale problems. The example respectively illustrates the two substructural methods are both effective.

Zehua Zhang, Duoqian Miao, Jin Qian

### Decision Rules for Decision Tables with Many-Valued Decisions

In the paper, authors presents a greedy algorithm for construction of exact and partial decision rules for decision tables with many-valued decisions. Exact decision rules can be ‘over-fitted’, so instead of exact decision rules with many attributes, it is more appropriate to work with partial decision rules with smaller number of attributes. Based on results for set cover problem authors study bounds on accuracy of greedy algorithm for exact and partial decision rule construction, and complexity of the problem of minimization of decision rule length.

Igor Chikalov, Beata Zielosko

### Backmatter

Weitere Informationen