Skip to main content

2006 | Buch

Rough Sets and Current Trends in Computing

5th International Conference, RSCTC 2006 Kobe, Japan, November 6-8, 2006 Proceedings

herausgegeben von: Salvatore Greco, Yutaka Hata, Shoji Hirano, Masahiro Inuiguchi, Sadaaki Miyamoto, Hung Son Nguyen, Roman Słowiński

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Inhaltsverzeichnis

Frontmatter

Invited Papers

Decision Trees and Flow Graphs

We consider association of decision trees and flow graphs, resulting in a new method of decision rule generation from data, and giving a better insight in data structure. The introduced flow graphs can also give a new look at the conception of probability. We show that in some cases the conception of probability can be eliminated and replaced by a study of deterministic flows in a flow network.

Zdzisław Pawlak
Granular Computing – The Concept of Generalized Constraint-Based Computation

Basically, granular computing is a mode of computing in which the objects of computation are granular variables. Let

X

be a variable which takes values in a universe of discourse,

U

. Informally, a granule is a clump of elements of

U

which are drawn together by indistinguishability, similarity or proximity. For example, an interval is a granule; so is a fuzzy interval; so is a gaussian distribution; and so is a cluster of elements of

U

. A granular variable is a variable which takes granules as values. If

G

is value of

X

, then

G

is referred to as a granular value of

X

. If

G

is a singleton, then

G

is a singular value of

X

. A linguistic variable is a granular variable whose values are labeled with words drawn from a natural language. For example, if

X

is temperature, then 101.3 is a singular value of temperature, while “high” is a granular (linguistic) value of temperature.

Lotfi A. Zadeh
Bipolar Representations in Reasoning, Knowledge Extraction and Decision Processes

This paper surveys various areas in information engineering where an explicit handling of positive and negative sides of information is appropriate. Three forms of bipolarity are laid bare. They can be instrumental in logical representations of incompleteness, rule representation and extraction, argumentation, and decision analysis.

Didier Dubois, Henri Prade
Kansei Engineering and Rough Sets Model

M. Nagamachi founded Kansei Engineering at Hiroshima University about 30 years ago and it has spread out in the world as an ergonomic consumer-oriented product development. The aim of the kansei engineering is to develop a new product by translating a customer’s psychological needs and feeling (kansei) concerning it into design specifications. The kansei data are analyzed by a multivariate statistical analysis to create the new products so far, but the kansei data not always have linear features assumed under the normal distribution. Rough sets theory is able to deal with any kind of data, irrespective of linear or non-linear characteristics of the data. We compare the results based on statistical analysis and on Rough Sets Theory.

Mitsuo Nagamachi
Stochastic Approach to Rough Set Theory

The presentation introduces the basic ideas and investigates the stochastic approach to rough set theory. The major aspects of the stochastic approach to rough set theory to be explored during the presentation are: the probabilistic view of the approximation space, the probabilistic approximations of sets, as expressed via variable precision and Bayesian rough set models, and probabilistic dependencies between sets and multi-valued attributes, as expressed by the absolute certainty gain and expected certainty gain measures, respectively. The measures allow for more comprehensive evaluation of rules computed from data and for computation of attribute reduct, core and significance factors in probabilistic decision tables.

Wojciech Ziarko

Commemorative Papers for Professor Pawlak

Zdzisław Pawlak Commemorating His Life and Work

Zdzisław Pawlak will be remembered as a great human being with exceptional humility, wit and kindness as well as an extraordinarily innovative researcher with exceptional stature. His research contributions have had far-reaching implications inasmuch as his works are fundamental in establishing new perspectives for scientific research in a wide spectrum of fields.

Andrzej Skowron, James F. Peters
Pawlak Rough Set Model, Medical Reasoning and Rule Mining

This paper overviews the following two important issues on the correspondence between Pawlak’s rough set model and medical reasoning. The first main idea of rough sets is that a given concept can be approximated by partition-based knowledge as upper and lower approximation. Interestingly, thes approximations correspond to the focusing mechanism of differential medical diagnosis; upper approximation as selection of candidates and lower approximation as concluding a final diagnosis. The second idea of rough sets is that a concept, observations can be represented as partitions in a given data set, where rough sets provides a rule induction method from a given data. Thus, this model can be used to extract rule-based knowledge from medical databases. Especially, rule induction based on the focusing mechanism is obtained in a natural way.

Shusaku Tsumoto

Logics in Rough Sets

Algebras of Terms in Pawlak’s Information Systems

The notion of information system can be seen as a semantic system, but essentially algebraical in the nature. We show in this note, that it can be treated as a many sorted algebra and that the algebra of terms can be useful in describing and analysing the data in the system.

J. A. Pomykała
Monads Can Be Rough

Traditionally, rough sets build upon relations based on ordinary sets, i.e. relations on

X

as subsets of

X

×

X

. A starting point of this paper is the equivalent view on relations as mappings from

X

to the (ordinary) power set

PX

. Categorically,

P

is a set functor, and even more so, it can in fact be extended to a monad (

P

,

η

,

μ

). This is still not enough and we need to consider the partial order (

PX

,≤). Given this partial order, the ordinary power set monad can be extended to a

partially ordered monad

. The partially ordered ordinary power set monad turns out to contain sufficient structure in order to provide rough set operations. However, the motivation of this paper goes far beyond ordinary relations as we show how more general power sets, i.e. partially ordered monads built upon a wide range of set functors, can be used to provide what we call

rough monads

.

Patrik Eklund, M. A. Galán
On Testing Membership to Maximal Consistent Extensions of Information Systems

This paper provides a new algorithm for testing membership to maximal consistent extensions of information systems. A maximal consistent extension of a given information system includes all objects corresponding to known attribute values which are consistent with all true and realizable rules extracted from the original information system. An algorithm presented here does not involve computing any rules, and has polynomial time complexity. This algorithm is based on a simpler criterion for membership testing than the algorithm described in [4]. The criterion under consideration is convenient for theoretical analysis of maximal consistent extensions of information systems.

Mikhail Moshkov, Andrzej Skowron, Zbigniew Suraj
The Research of Rough Sets in Normed Linear Space

As a new mathematical theory, rough sets have been applied to process imprecise, uncertain and incomplete data. The research of rough sets has been fruitful in finite and non-empty sets. Rough sets, however, only serve as a theoretic tool to discretize the real function. As far as the real function research is concerned, the research work to define rough sets in the real function is infrequent. In this paper, we exploit a new method to define rough sets in normed linear space. We put forward an upper and lower approximation definition, and make preliminary research in the properties of rough sets. A new theoretical tool is provided to study the approximation solutions to differential equation and functional variation in normed linear space.This research is significant in that it extends the application of rough sets to a new field.

Hui Sun, Qing Liu
Two Kinds of Rough Algebras and Brouwer-Zadeh Lattices

Many researchers study rough sets from the point of view of description of the rough set pairs(a rough set pair is also called a rough set), i.e. <lower approximation set, upper approximation set>. Comer [4] showed that all the rough sets in an approximation space constructed a regular double Stone algebra. The constructed algebra is called the rough double Stone algebra in this paper. Pagliani [19] interpreted Rough Set System (all the rough sets in an approximation space in disjoint representation) as a Nelson algebra. The constructed Nelson algebra from an approximation space is called the rough Nelson algebra in this paper. It is showed that a rough double Stone algebra is a Brouwer-Zadeh lattice, and a rough Nelson algebra is a Brouwer-Zadeh lattice also.

Jian-Hua Dai, Hanfei Lv, Weidong Chen, Yunhe Pan

Logics in Fuzzy Sets

Balanced Fuzzy Gates

In this paper, we introduce and study a new concept of balanced fuzzy gates. The idea itself is based upon the balanced fuzzy sets forming an essential extension of the generic theory of fuzzy sets. We discuss the topology of the gates and elaborate on several fundamental models of logic connectives. A particular focus is on the two categories of the gates realizing a certain

and

and

or

type of processing. In the sequel presented are architectures of networks built with the use of the logical gates. We offer some design guidelines of the development of the networks and elaborate on the nature of the structural construction.

Wladyslaw Homenda, Witold Pedrycz
Triangle Algebras: Towards an Axiomatization of Interval-Valued Residuated Lattices

In this paper, we present triangle algebras: residuated lattices equipped with two modal, or approximation, operators and with a third angular point

u

, different from 0 (false) and 1 (true), intuitively denoting ignorance about a formula’s truth value. We prove that these constructs, which bear a close relationship to several other algebraic structures including rough approximation spaces, provide an equational representation of interval-valued residuated lattices; as an important case in point, we consider

$\mathcal{L}^I$

, the lattice of closed intervals of [0,1]. As we will argue, the representation by triangle algebras serves as a crucial stepping stone to the construction of formal interval-valued fuzzy logics, and in particular to the axiomatic formalization of residuated t-norm based logics on

$\mathcal{L}^I$

, in a similar way as was done for formal fuzzy logics on the unit interval.

Bart Van Gasse, Chris Cornelis, Glad Deschrijver, Etienne Kerre

Fuzzy-Rough Hybridization

An Approach to Parameterized Approximation of Crisp and Fuzzy Sets

This paper proposes a concept of parameterized approximation of crisp and fuzzy sets, basing on the notion of rough and fuzzy rough inclusion function. A definition of a single

ε

-approximation is given. It is suitable for expressing the lower and upper approximations defined in the rough set theory and the variable precision rough set model. A unified form of approximation is especially advantageous in the case of fuzzy information systems. It helps to avoid problems caused by different forms of fuzzy connectives used in the original definition of fuzzy rough sets. The presented parameterized approach to approximation constitutes an easy to implement, straightforward generalization of the variable precision crisp and fuzzy rough set model.

Alicja Mieszkowicz-Rolka, Leszek Rolka
Rough Fuzzy Set Approximations in Fuzzy Formal Contexts

In the paper, we introduce a new kind of fuzzy formal concept derived from an adjoint pair of operations. Based on the discussed fuzzy formal concepts, a pair of rough fuzzy set approximations in fuzzy formal contexts is introduced. The properties of the proposed approximation operators are examined in details.

Ming-Wen Shao, Min Liu, Wen-Xiu Zhang
Webpage Classification with ACO-Enhanced Fuzzy-Rough Feature Selection

Due to the explosive growth of electronically stored information, automatic methods must be developed to aid users in maintaining and using this abundance of information effectively. In particular, the sheer volume of redundancy present must be dealt with, leaving only the information-rich data to be processed. This paper presents an approach, based on an integrated use of fuzzy-rough sets and Ant Colony Optimization (ACO), to greatly reduce this data redundancy. The work is applied to the problem of webpage categorization, considerably reducing dimensionality with minimal loss of information.

Richard Jensen, Qiang Shen

Approximate and Uncertain Reasoning

Association Reducts: Complexity and Heuristics

We investigate association reducts, which extend previously studied information and decision reducts in capability of expressing dependencies between groups of attributes in data. We formulate optimization problems related to the most informative associations between groups of attributes. We provide heuristic mechanisms for addressing those problems. We also discuss at more general level how to express approximate dependencies between groups of attributes.

Dominik Ślęzak
Planning Based on Reasoning About Information Changes

We consider the problem of reasoning about information changes in the context of complex concepts approximated hierarchically and actions that can be triggered to change properties of investigated objects. A given object can be in an unwanted state, where some concept is not satisfied to the required degree. We would like to find a plan (sequence of actions to be executed) of object’s transformation to a new state which is acceptable. Presented approach is based on reasoning about changes.

Andrzej Skowron, Piotr Synak
Rough Approximation Operators in Covering Approximation Spaces

In this paper, we focus on the study of covering based rough sets in covering approximation spaces. Firstly, two pairs of covering approximation operators are reviewed, their properties are investigated. Secondly, Based on the covering of the covering approximation space, two new coverings of the universe are induced, by which two new pairs of covering approximation operators are constructed. Furthermore, the properties of these operators are examined. Finally, by a comparison of these approximation operators, some conditions are gained under which some or all of these approximation operators are equivalent.

Tong-Jun Li

Variable Precision Rough Set Models

A New Method for Discretization of Continuous Attributes Based on VPRS

A new method for discretization of continuous features based on the Variable Precision Rough Set theory is proposed and tested in the process of inducing decision trees. Through rectifying error ratio, the generalization capability of decision trees is enhanced by enlarging or reducing the sizes of positive regions. Two ways of computing frequency and width are deployed to calculate the misclassifying rate of the data, and thus the negative effect on decision trees is reduced, by which the discretization points are determined. In the paper, we use some open data sets to testify the method. The results are compared with that obtained by C4.5, which shows that the presented method is a feasible way to discretization of continuous features in applications.

Jin-Mao Wei, Guo-Ying Wang, Xiang-Ming Kong, Shu-Jie Li, Shu-Qin Wang, Da-You Liu
On Variable Consistency Dominance-Based Rough Set Approaches

We consider different variants of Variable Consistency Dominance-based Rough Set Approach (VC-DRSA). These variants produce more general (extended) lower approximations than those computed by Dominance-based Rough Set Approach (DRSA), (i.e., lower approximations that are supersets of those computed by DRSA). They define lower approximations that contain objects characterized by a strong but not necessarily certain relation with approximated sets. This is achieved by introduction of parameters that control consistency of objects included in lower approximations. We show that lower approximations generalized in this way enable us to observe dependencies that remain undiscovered by DRSA. Extended lower approximations are also a better basis for rule generation. In the paper, we focus our considerations on different definitions of generalized lower approximations. We also show definitions of VC-DRSA decision rules, as well as their application to classification/sorting and ranking/choice problems.

Jerzy Błaszczyński, Salvatore Greco, Roman Słowiński, Marcin Szeląg
Variable-Precision Dominance-Based Rough Set Approach

In order to treat the ordinality and the monotonicity between condition and decision attributes in decision tables, the dominance-based rough set approach (DRSA) has been developed. Moreover, to treat the hesitation in evaluation, the variable-consistency dominance-based rough set approach (VC-DRSA) has been proposed. However, the VC-DRSA is not always suitable to treat errors and outliers. In this paper, we propose a new approach called a variable-precision dominance-based approach (VP-DRSA) to be suitable for accommodating errors and outliers.

Masahiro Inuiguchi, Yukihiro Yoshioka

Incomplete/Nondeterministic Information Systems

Applying Rough Sets to Data Tables Containing Imprecise Information Under Probabilistic Interpretation

Rough sets are applied to data tables containing imprecise information under probabilistic interpretation. A family of weighted equivalence classes is obtained, in which each equivalence class is accompanied by the probabilistic degree to which it is an actual one. By using the family of weighted equivalence classes we can derive a lower approximation and an upper approximation. The lower approximation and the upper approximation coincide with those obtained from methods of possible worlds.

Michinori Nakata, Hiroshi Sakai
Ensembles of Decision Rules for Solving Binary Classification Problems in the Presence of Missing Values

In this paper, we consider an algorithm that generates an ensemble of decision rules. A single rule is treated as a specific subsidiary, base classifier in the ensemble that indicates only one of the decision classes. Experimental results have shown that the ensemble of decision rules is as efficient as other machine learning methods. In this paper we concentrate on a common problem appearing in real-life data that is a presence of missing attributes values. To deal with this problem, we experimented with different approaches inspired by rough set approach to knowledge discovery. Results of those experiments are presented and discussed in the paper.

Jerzy Błaszczyński, Krzysztof Dembczyński, Wojciech Kotłowski, Roman Słowiński, Marcin Szeląg
Expanding Tolerance RST Models Based on Cores of Maximal Compatible Blocks

Based on tolerance relation, this paper proposes three knowledge representation systems and then discusses their properties from a new prospective of cores of maximal compatible blocks. It also discusses the relationships of the three knowledge representation systems with the other two proposed by Kryszkiewicz M. and Wanli C. respectively. It considers the measurements such as the accuracy measurements, the rough entropies of knowledge. It also defines and studies rough entropies of set about knowledge in the three systems and obtains several meaningful theorems.

Chen Wu, Xiaohua Hu, Jingyu Yang, Xibei Yang
Local and Global Approximations for Incomplete Data

For completely specified decision tables, where lower and upper approximations are unique, the lower approximation is the largest definable set contained in the approximated set

X

and the upper approximation of

X

is the smallest definable set containing

X

. For incomplete decision tables the existing definitions of upper approximations provide sets that, in general, are not minimal definable sets. The same is true for approximations based on relations that are generalizations of the equivalence relation. In this paper we introduce two definitions of approximations, local and global, such that the corresponding upper approximations are minimal. Local approximations are more precise than global approximations. Global lower approximations may be determined by a polynomial algorithm. However, algorithms to find both local approximations and global upper approximations are NP-hard.

Jerzy W. Grzymala-Busse, Wojciech Rzasa
Missing Template Decomposition Method and Its Implementation in Rough Set Exploration System

We describe a method of dealing with sets that contain missing information in case of classification task. The described method uses multi-stage scheme that induces and combines classifiers for complete parts of the original data. The principles of the proposed Missing Template Decomposition Method are presented together with general explanation of the implementation within the RSES framework. The introduced ideas are illustrated with an example of classification experiment on a real data set.

Jan G. Bazan, Rafał Latkowski, Marcin Szczuka
On Possible Rules and Apriori Algorithm in Non-deterministic Information Systems

A framework of rule generation in

Non

-

deterministicInfor

-

mationSystems

(

NISs

), which follows rough sets based rule generation in

DeterministicInformation

Systems

(

DISs

), is presented. We have already coped with

certainrules

and

minimalcertain

rules

, which are characterized by the concept of

consistency

, in

NISs

. We also introduced

discernibilityfunctions

into

NISs

. In this paper,

possiblerules

in

NISs

are focused on. Because of the information incompleteness, huge number of

possiblerules

may exist, and we introduce

Min

-

Maxstrategy

and

Max

-

Maxstrategy

into possible rule generation in

NISs

. Possible rules based on these strategies are characterized by the criteria

minimumsupport

,

maximumsupport

,

minimumaccuracy

and

maximumaccuracy

, and Apriori based algorithm is applied.

Hiroshi Sakai, Michinori Nakata

Decision Support

Generalized Conflict and Resolution Model with Approximation Spaces

This paper considers the problem of a generalized model for conflict analysis and resolution. Such a model would be helpful in analyzing and resolving conflict in disputes in both government and industry, where disputes and negotiations about various issues are the norm. The issue here is how to model a combination of situations among agents i) where there are disagreements leading to a conflict situation ii) need for an acceptable set of agreements. The solution to this problem stems from pioneering work on this subject by Zdzisław Pawlak, which provides a basis for a generalized model encapsulating a decision system with complex decisions and an approximation space-based conflict resolution using rough coverage. An example of a requirements scope negotiation for an automated lighting system is presented. The contribution of this paper is a rough set based requirements scope determination model using a generalized conflict model with approximation spaces.

Sheela Ramanna, James F. Peters, Andrzej Skowron
Rough Set Approach to Customer Satisfaction Analysis

Customer satisfaction analysis has become a hot issue in strategic management. The basis of any decision in this field is the analysis of the answers of a sample of customers to a specific questionnaire. Traditionally, using a methodology called conjoint analysis, the data obtained from the questionnaires are used to build a collective utility function representing customer preferences. This utility function permits to measure the satisfaction of the customers and to determine the most critical features relevant for the appreciation of the considered products or services. In this paper, we propose an alternative methodology to analyze the data from the questionnaire. Our approach is based on the rough set methodology and represents the preferences of the customers by means of simple decision rules such as “if feature

α

is considered good and feature

β

is considered sufficient, then the overall evaluation of the product is medium”. The interpretation of the decision rules is simpler and more direct than the interpretation of the utility function given by conjoint analysis. Moreover, the capacity of representing customer preferences in terms of easily understandable “if ..., then...” statements expressed in the natural language makes our approach particularly interesting for Kansei Engineering. The proposed methodology gives also some indications relative to strategic interventions aimed at improving the quality of the offered products and services. The expected efficiency of these interventions is measured, which is very useful for the definition of proper customer satisfaction strategies. Our approach supplies an integrated support to customer satisfaction oriented management.

Salvatore Greco, Benedetto Matarazzo, Roman Słowiński
Utility Function Induced by Fuzzy Target in Probabilistic Decision Making

It is widely accepted that a common precept for the choice under uncertainty is to use the expected utility maximization principle, which was established axiomatically. Recently, a formal equivalence between this principle of choice and the target-based principle, that suggests that one should select an action which maximizes the (expected) probability of meeting a (probabilistic) uncertain target, has been established and extensively discussed. In this paper, we discuss the issue of how to bring fuzzy targets within the reach of the target-based model for a class of decision making under uncertainty problems. Two methods for inducing utility functions from fuzzy targets are discussed and illustrated with an example taken from the literature.

Van-Nam Huynh, Yoshiteru Nakamori, Tu-Bao Ho

Multi-criteria Decision Support

Dominance-Based Rough Set Approach to Decision Involving Multiple Decision Makers

In this paper we present a rough set approach to decisions with multiple decision makers. Since preference order is a crucial feature of data concerning decision situations, and classical rough set approach is not able to deal with it, we are using Dominance-based Rough Set Approach (DRSA) where indiscernibility relation of the classical approach has been substituted by dominance relation. To deal with decision of multiple decision makers we extend DRSA by introducing specific concepts related to dominance with respect to minimal profiles of evaluations given by multiple decision makers. This extension provides also a general methodology for rough approximations of partial preorders.

Salvatore Greco, Benedetto Matarazzo, Roman Słowiński
Quality of Rough Approximation in Multi-criteria Classification Problems

Dominance-based Rough Set Approach (DRSA) has been proposed to deal with multi-criteria classification problems, where data may be inconsistent with respect to the dominance principle. In this paper, we consider different measures of the quality of approximation, which is the value indicating how much inconsistent the decision table is. We begin with the classical definition, based on the relative number of inconsistent objects. Since this measure appears to be too restrictive in some cases, a new approach based on the concept of generalized decision is proposed. Finally, motivated by emerging problems in the presence of noisy data, the third measure based on the object reassignment is introduced. Properties of these measures are analysed in light of rough set theory.

Krzysztof Dembczyński, Salvatore Greco, Wojciech Kotłowski, Roman Słowiński
Rough-Set Multiple-Criteria ABC Analysis

Multiple-criteria ABC (MCABC) analysis is conducted using a dominance-based rough set approach. ABC analysis, a well-known technique for inventory planning and control, divides stock-keeping units (SKUs) into three classes according to annual dollar usage. But MCABC analysis offers more managerial flexibility by including other criteria, such as lead time and criticality, in the classification of SKUs. The objective of this paper is to propose an MCABC method that uses the dominance-based rough set approach to generate linguistic rules that represent a decision-maker’s preferences based on the classification of a test data set. These linguistic rules are then used to classify all SKUs. A case study demonstrates that the procedure is feasible.

Ye Chen, Kevin W. Li, Jason Levy, Keith W. Hipel, D. Marc Kilgour

Rough Sets in KDD

A Method of Generating Decision Rules in Object–Oriented Rough Set Models

We propose a method of decision rule generation in object–oriented rough set models proposed by Kudo and Murai. The object–oriented rough set model is an extension of the “traditional” rough set theory by introducing object–oriented paradigm used in computer science. The object–oriented rough set model treats objects as instances of some classes, and illustrate structural hierarchies among objects based on is-a relationship and has-a relationship. In this paper, we introduce decision rules in the object–oriented rough set model, and revise discernibility matrices proposed by Skowron and Rauser to generate decision rules in the object–oriented rough set model.

Yasuo Kudo, Tetsuya Murai
Knowledge Reduction in Set-Valued Decision Information System

Reduction is one of the key problem in rough set theory due to its applications in data mining, rule induction, classification, etc.. In this paper the reductions for a set-valued decision information system(DIS) are studied. The judgment theorem and the discernibility matrix of the generalized decision reduct in a set-valued DIS are given, and the relationships among the generalized decision reduct and alternative types of knowledge reduction in set-valued DIS are investigated. It is proved that the reduct in the consistent set-valued DIS is equivalent to the generalized decision reduct, and the possible reduct in the inconsistent set-valued DIS is equivalent to the generalized decision reduct. The judgment theorem and the discernibility matrix associated with the possible reduct are also established.

Xiao-Xue Song, Wen-Xiu Zhang
Local Reducts and Jumping Emerging Patterns in Relational Databases

This paper refers to the notion of minimal pattern in relational databases. We study the analogy between two concepts: a local reduct, from the rough set theory, and a jumping emerging pattern, originally defined for transactional data. Their equivalence within a positive region and similarities between eager and lazy classification methods based on both ideas are demonstrated. Since pattern discovery approaches vary significantly, efficiency tests have been performed in order to decide, which solution provides a better tool for the analysis of real relational datasets.

Pawel Terlecki, Krzysztof Walczak
Mining Rough Association from Text Documents

It is a big challenge to guarantee the quality of association rules in some application areas (e.g., in Web information gathering) since duplications and ambiguities of data values (e.g., terms). This paper presents a novel concept of rough association rules to improve the quality of discovered knowledge in these application areas. The premise of a rough association rule consists of a set of terms (items) and a weight distribution of terms (items). The distinct advantage of rough association rules is that they contain more specific information than normal association rules. It is also feasible to update rough association rules dynamically to produce effective results. The experimental results also verify that the proposed approach is promising.

Yuefeng Li, Ning Zhong
NetTRS – Induction and Postprocessing of Decision Rules

The internet service NetTRS that enable to induction, evaluation, and postprocessing of decision rules is presented in the paper. The TRS library is the main part of the service. The TRS library makes possible, among others, induction of decision rules by means of tolerance rough sets model.

Marek Sikora, Marcin Michalak
Outlier Detection Based on Rough Membership Function

In recent years, much attention has been given to the problem of outlier detection, whose aim is to detect outliers — individuals who behave in an unexpected way or have abnormal properties. Outlier detection is critically important in the information-based society. In this paper, we propose a new definition for outliers in rough set theory which exploits the rough membership function. An algorithm to find such outliers in rough set theory is also given. The effectiveness of our method for outlier detection is demonstrated on two publicly available databases.

Feng Jiang, Yuefei Sui, Cungen Cao

Rough Sets in Medicine

An Approach to a Rough Set Based Disease Inference Engine for ECG Classification

An inference engine for classification of ECG signals is developed with the help of a rule based rough set decision system. For this purpose an automated ECG data extraction system from ECG strips is being developed by using few image processing techniques. Filtering techniques are used for removal of noises from recorded ECG. A knowledge base is developed after consultation of different medical books and feedback of reputed cardiologists regarding ECG interpretation and selection of essential time-plane features of ECG signal. An algorithm for extraction of different time domain features is also developed with the help of differentiation techniques and syntactic approaches. Finally, a rule-based roughest decision system is generated from these time-plane features for the development of an inference engine for disease classification.

S. Mitra, M. Mitra, B. B. Chaudhuri
Attribute Selection for EEG Signal Classification Using Rough Sets and Neural Networks

This paper describes the application of rough sets and neural network models for classification of electroencephalogram (EEG) signals from two patient classes: normal and epileptic. First, the wavelet transform (WT) was applied to the EEG time series in order to reduce the dimensionality and highlight important features in the data. Statistical measures of the resulting wavelet coefficients were used for the classification task. Employing rough sets, we sought to determine which of the acquired attributes were necessary/informative as predictors of the decision classes. The results indicate that rough sets was able to accurately classify the datasets with an accuracy of almost 100%. The resulting rule sets were small, with an average cardinality of 6. These results were confirmed using standard neural network based classifiers.

Kenneth Revett, Marcin Szczuka, Pari Jahankhani, Vassilis Kodogiannis
Automatic Planning of Treatment of Infants with Respiratory Failure Through Rough Set Modeling

We discuss an application of rough set tools for modeling networks of classifiers induced from data and ontology of concepts delivered by experts. Such networks allow us to develop strategies for automated planning of a treatment of infants with respiratory illness. We report results of experiments with the networks of classifiers used in automated planning of the treatment of newborn infants with respiratory failure. The reported experiments were performed on medical data obtained from the Neonatal Intensive Care Unit in the Department of Pediatrics, Collegium Medicum, Jagiellonian University.

Jan G. Bazan, Piotr Kruczek, Stanislawa Bazan-Socha, Andrzej Skowron, Jacek J. Pietrzyk
Developing a Decision Model for Asthma Exacerbations: Combining Rough Sets and Expert-Driven Selection of Clinical Attributes

The paper describes the development of a clinical decision model to help Emergency Department physicians assess the severity of pediatric asthma exacerbations. The model should support an early identification (at 2 hours) of those patients who are having a mild attack and those who are having a moderate/severe attack. A comprehensive approach combining rough sets and expert-driven manual feature selection was applied to develop a rule-based decision model from retrospective data that described asthmatic patients visiting the Emergency Department. The experiment involved creating the following four potential decision models differentiated by the subsets of clinical attributes that were considered: Model A using all attributes collected in the retrospective chart study; Model B using only attributes describing the patient’s history; Model C using only attributes describing the triage and repeated assessments; and Model D using attributes from Model C expanded with some of the attributes from Model B identified by expert clinical knowledge. Model D offered the highest assessment accuracy when tested on an independent retrospective data set and was selected as the decision model for asthma exacerbations.

Ken Farion, Wojtek Michalowski, Szymon Wilk

Granular Computing

A GrC-Based Approach to Social Network Data Protection

Social network analysis is an important methodology in sociological research. Although social network data is very useful to researchers and policy makers, releasing it to the public may cause an invasion of privacy. In this paper, we generalize the techniques used to protect private information in tabulated data, and propose some safety criteria for assessing the risk of breaching confidentiality by releasing social network data. We assume a situation of data linking, where data is released to a particular user who has some knowledge about individual nodes of a social network. We adopt description logic as the underlying knowledge representation formalism and consider the safety criteria in both open-world and closed-world contexts.

Da-Wei Wang, Churn-Jung Liau, Tsan-sheng Hsu
An Interpretation of Flow Graphs by Granular Computing

Flow graph (FG) is a unique approach in data mining and data analysis mainly in virtue of its well-structural characteristics of network, which is naturally consistent with granular computing (GrC). Meanwhile, GrC provides us with both structured thinking at the philosophical level and structured problem solving at the practical level. The main objective of the present paper is to develop a simple and more concrete model for flow graph using GrC. At first, FG will be mainly discussed in three aspects under GrC, namely, granulation of FG, some relationships and operations of granules. Moreover, as one of advantages of this interpretation, an efficient approximation reduction algorithm of flow graph is given under the framework of GrC.

Jigui Sun, Huawen Liu, Changsong Qi, Huijie Zhang
Attribute Reduction Based on Granular Computing

Attribute reduction is a very important issue in data mining and machine learning. Granular computing is a new kind of soft computing theory. A novel method for encoding granules using bitmap technique is proposed in this paper. A new attribute reduction method based on granular computing is also developed with this encoding method. It is proved to be efficient.

Jun Hu, GuoYin Wang, QingHua Zhang, XianQuan Liu
Methodological Identification of Information Granules-Based Fuzzy Systems by Means of Genetic Optimization

In this study, we introduce an information granules-based fuzzy systems and a methodological identification by means of genetic optimization to carry out the model identification of complex and nonlinear systems. Information granulation realized with Hard C-Means clustering help determine the initial parameters of fuzzy model such as the initial apexes of the membership functions in the premise part and the initial values of polynomial functions in the consequence part of the fuzzy rules. And the initial parameters are tuned effectively with the aid of the genetic algorithms and the least square method. The design methodology emerges as a hybrid structural optimization and parametric optimization. Especially, genetic algorithms (GAs) and HCM clustering are used to generate the structurally as well as parametrically optimized fuzzy model. To identify the structure and parameters of fuzzy model we exploit the methodologies of a respective and consecutive identification by means of genetic algorithms. The proposed model is contrasted with the performance of the conventional fuzzy models in the literature.

Sung-Kwun Oh, Keon-Jun Park, Witold Pedrycz
Optimization of Information Granulation-Oriented Fuzzy Set Model Using Hierarchical Fair Competition-Based Parallel Genetic Algorithms

In this study, we introduce the hybrid optimization of fuzzy inference systems that is based on information granulation and Hierarchical Fair Competition-based Parallel Genetic Algorithms (HFCGA). The granulation is realized with the aid of the Hard C-means clustering and HFCGA is a kind of multi-populations of Parallel Genetic Algorithms (PGA), and it is used for structure optimization and parameter identification of fuzzy set model. It concerns the fuzzy model-related parameters as the number of input variables, a collection of specific subset of input variables, the number of membership functions, and the apexes of the membership function. In the hybrid optimization process, two general optimization mechanisms are explored. The structural optimization is realized via HFCGA and HCM method whereas in case of the parametric optimization we proceed with a standard least square method as well as HFCGA and HCM method as well. A comparative analysis demonstrates that the proposed algorithm is superior to the conventional methods.

Jeoung-Nae Choi, Sung-Kwun Oh, Witold Pedrycz

Grey Systems

A Grey-Based Rough Set Approach to Suppliers Selection Problem

The suppliers selection problem is one of the most important components in supply chain management. In recent years, rough set theory has emerged as a powerful tool for suppliers selection problem. In this paper, we proposed a grey-based rough set approach to resolve suppliers selection problem. The work is motivated by the following observations: First, in the decision table of rough set theory, attribute values must be known precisely. Generally, decision makers’ judgments on attribute often cannot be estimated by the exact numerical value. Second, in rough set theory, the alternatives of ideal suppliers are decided by lower approximation, so the ranks of each ideal supplier is equal. Therefore it is difficult to select the best ideal supplier. The work procedure is shown as follows briefly: First, the attribute values of rough set decision table for all alternatives are decided by linguistic variables that can be expressed in grey number. Second, ideal suppliers are decided by the lower approximation of grey-based rough set theory. Third, the best ideal supplier is decided by grey relational analysis based on grey number. Finally, an example of selection problem of suppliers was used to illustrate the proposed approach.

Guo-Dong Li, Daisuke Yamaguchi, Hui-Shan Lin, Kun-Li Wen, Masatake Nagai
A Hybrid Grey-Based Dynamic Model for International Airlines Amount Increase Prediction

In this paper, we propose a hybrid grey-based dynamic model, then it is applied to the prediction problem of international airlines amount increase in China. The work is motivated by the following observations: First, a system of international airlines is an uncertain dynamic system, and the effects of other systems on the system being monitored are also unclear. Thus it is difficult for us to predict next annual airlines amount from the system. Second, grey system theory is one of the methods that used to study uncertainty, and it is superior in mathematical analysis of systems with uncertain information. The system of international airlines can be viewed a grey dynamic system, therefore grey dynamic model GM(1,1) which is a single variable first order differential prediction model based on grey system theory can be used to solve the prediction problem. Third, since the development trend of international airlines amount is affected by variant random factors, it is difficult to obtain high predicted accuracy by single grey dynamic model. The work procedure is shown as follows briefly: First, the Markov-chain is integrated into GM(1,1) to enhance the predicted accuracy. Second, we present Taylor approximation method based on grey interval analysis for obtaining high accuracy furthermore. Finally, the statistics data of international airlines amount from 1985 to 2003 in China is used to verify the effectiveness of proposed model.

Guo-Dong Li, Daisuke Yamaguchi, Kun-Li Wen, Masatake Nagai
On the Combination of Rough Set Theory and Grey Theory Based on Grey Lattice Operations

A new rough set named grey-rough set based on the grey lattice operation in grey system theory is proposed in this paper. Information systems records not only categorical data but also numerical data including a range of interval. In order to handle interval data in such information systems, we describe two sorts of new rough approximation after introduced grey lattice operations: a special grey-rough set based on the equivalence relation of interval coincidence, and a general grey-rough set based on the meet operation and the inclusion relation instead of the equivalence relation. The special grey-rough set is applicable to categorical data and numerical discrete data like the traditional rough set. The general grey-rough set is applicable to numerical interval data, which means that the proposal is an advanced method for non-deterministic information systems. The proposal is illustrated with several examples.

Daisuke Yamaguchi, Guo-Dong Li, Masatake Nagai

Ontology and Mereology

An Ontology-Based First-Order Modal Logic

First-order modal logic is not just propositional modal logic plus classical quantifier machinery. The situation is much subtler than that. The addition of quantifiers to propositional modal logic may lead to many difficulties. In this paper we aim to solve one of them — the problem of rigidity versus non-rigidity for variables, that is, how to determine the denotations for each variable in different possible worlds or the connections among the denotations of each variable in different possible worlds. Since all the currently proposed semantics for first-order modal logic are not suitable to solve this problem, we proposed an ontology-based first-order modal semantics, in which ontologies are introduced to restrain the modal logic frames and models. An ontology-based counterpart relation

S

is introduced into each model. By requiring that the assignments of each variable in different possible worlds must accord with the relation

S

, we can correctly characterize the connections among the denotations of each variable in different worlds.

Feng Jiang, Yuefei Sui, Cungen Cao
Enhancing a Biological Concept Ontology to Fuzzy Relational Ontology with Relations Mined from Text

In this paper we investigate the problem of enriching an existing biological concept ontology into a fuzzy relational ontology structure using

generic

biological relations and their strengths mined from tagged biological text documents. Though biological relations in a text are defined between a pair of entities, the entities are usually tagged by their concept names in a tagged corpus. Since the tags themselves are related taxonomically, as given in the ontology, the mined relations have to be properly characterized before entering them into the ontology. We have proposed a mechanism to generalize each relation to be defined at the most appropriate level of specificity, before it can be added to the ontology. Since the mined relations have varying degrees of associations with various biological concepts, an appropriate fuzzy membership generation mechanism is proposed to fuzzify the strengths of the relations. Extensive experimentation has been conducted over the entire GENIA corpus and the results of enhancing the GENIA ontology are presented in the paper.

Lipika Dey, Muhammad Abulaish
On a Parthood Specification Method for Component Software

The Basic Mereology framework of [8] is enriched by adding colimit construction from Category Theory. The new framework is then used to model component-based software architecture.

Dai Tri Man Le, Ryszard Janicki
Ontology Driven Concept Approximation

This paper investigates the concept approximation problem using ontology as an domain knowledge representation model and rough set theory. In [7] [8], we have presented a rough set based multi-layered learning framework for approximation of complex concepts assuming the existence of a simple concept hierarchy. The proposed methodology utilizes the ontology structure to learn compound concepts using the rough approximations of the primitive concepts as input attributes. In this paper we consider the extended model for knowledge representation where the concept hierarchies are embedded with additional knowledge in a form of relations or constrains among sub-concepts. We present an extended multi-layered learning scheme that can incorporate the additional knowledge and propose some classes of such relations that assure an improvement of the learning algorithm as well as a convenience of the knowledge modeling process. We illustrate the proposed method and present some results of experiment with data from sunspot recognition problem.

Sinh Hoa Nguyen, Trung Thanh Nguyen, Hung Son Nguyen

Statistical Mathods

A Statistical Method for Determining Importance of Variables in an Information System

A new method for estimation of attributes’ importance for supervised classification, based on the random forest approach, is presented. Essentially, an iterative scheme is applied, with each step consisting of several runs of the random forest program. Each run is performed on a suitably modified data set: values of each attribute found unimportant at earlier steps are randomly permuted between objects. At each step, apparent importance of an attribute is calculated and the attribute is declared unimportant if its importance is not uniformly better than that of the attributes earlier found unimportant. The procedure is repeated until only attributes scoring better than the randomized ones are retained. Statistical significance of the results so obtained is verified. This method has been applied to 12 data sets of biological origin. The method was shown to be more reliable than that based on standard application of a random forest to assess attributes’ importance.

Witold R. Rudnicki, Marcin Kierczak, Jacek Koronacki, Jan Komorowski
Distribution of Determinants of Contingency Matrix

This paper gives a empirical analysis of determinant, which empirically validates the trade-off between sample size and size of matrix. In the former studies, relations between degree of granularity and dependence of contingency tables are given from the viewpoint of determinantal divisors and sample size. The nature of determinantal divisors shows that the increase of the degree of granularity may lead to that of dependence. However, a constraint on the sample size of a contingency table is very strong, which leads to the evaluation formula where the increase of degree of granularity gives the decrease of dependency. This paper gives a further study of the nature of sample size effect on the degree of dependency in a contingency matrix. The results show that sample size will restrict the nature of matrix in a combinatorial way, which suggests that the dependency is closely related with integer programming.

Shusaku Tsumoto, Shoji Hirano
Interpretation of Contingency Matrix Using Marginal Distributions

This paper shows formal analysis of a contingency table based on its marginal distributions. The main approach is to make an expected matrix from two given marginal distributions and take the difference between original cell values and expected values to construct a residual matrix. The most important characeristics of a residual matrix are following: (1) Its determinant is equal to 0, which implies the rank of this matrix is less than the rank of an original matrix. (2) Each element of a residual matrix can be represented as a linear combination of 2 ×2 subderminants. These characteristics shows that the residual of a contingency matrix is closely related with 2 ×2 subderminants, which also shows that the

χ

2

test statistic is a function of 2 ×2 subderminants and marginal sums and suggests that distribution of determinants should have an important meaning for this statistic.

Shusaku Tsumoto, Shoji Hirano

Machine Learning

A Model of Machine Learning Based on User Preference of Attributes

A formal model of machine learning by considering user preference of attributes is proposed in this paper. The model seamlessly combines internal information and external information. This model can be extended to user preference of attribute sets. By using the user preference of attribute sets, user preferred reducts can be constructed.

Yiyu Yao, Yan Zhao, Jue Wang, Suqing Han
Combining Bi-gram of Character and Word to Classify Two-Class Chinese Texts in Two Steps

This paper presents a two-step method of combining two types of features for two-class Chinese text categorization. First, the bi-gram of character is regarded as candidate feature, a Naive Bayesian classifier is used to classify texts. Then, the fuzzy area between two categories is fixed directly according to the outputs of the classifier. Second, the bi-gram of word with parts of speech verb or noun is regarded as candidate feature, a Naive Bayesian classifier same as that in the first step is used to deal with the documents falling into the fuzzy area, which are thought of classifying unreliable in the previous step. Our experiment validated the soundness of the proposed method, which achieved a high performance, with the precision, recall and F

1

being 97.65%, 97.00% and 97.31% respectively on a test set consisting of 12,600 Chinese texts.

Xinghua Fan, Difei Wan, Guoying Wang
Combining Monte Carlo Filters with Support Vector Machines for Option Price Forecasting

This study proposes a hybrid model for online forecasting of option prices. The hybrid predictor combines a Monte Carlo filter with a support vector machine. The Monte Carlo filter (MCF) is used to infer the latent volatility and discount rate of the Black-Scholes model, and makes a subsequent prediction. The support vector machine is employed to capture the nonlinear residuals between the actual option prices and the MCF predictions. Taking the option transaction data on the Taiwan composite stock index, this study examined the forecasting accuracy of the proposed model. The performance of the hybrid model is superior to traditional extended Kalman filter models and pure SVM forecasts. The results can help investors to control and hedge their risks.

Shian-Chang Huang, Tung-Kuang Wu
Domain Knowledge Assimilation by Learning Complex Concepts

Domain, or background, knowledge has proven to be a key component in the development of high-performance classification systems, especially when the objects of interest exhibit complex internal structures, as in the case of images, time series data or action plans. This knowledge usually comes in extrinsic forms such as human expert advices, often contain complex concepts expressed in quasi-natural descriptive languages and need to be assimilated by the classification system. This paper presents a framework for the assimilation of such knowledge, equivalent to matching different ontologies of complex concepts, using rough mereology theory and rough set methods. We show how this framework allows a learning system to acquire complex, highly structured concepts from an external expert in an intuitive and fully interactive manner. We also argue the need to focus on expert’s knowledge elicited from outlier or novel samples, which we deem have a crucial impact on the classification process. Experiments on a large collection of handwritten digits are discussed.

Tuan Trung Nguyen
Learning Compound Decision Functions for Sequential Data in Dialog with Experts

In this paper, we investigate the problem of learning the decision functions for sequential data describing complex objects that are composed of subobjects. The decision function maps sequence of attribute values into a relational structure, representing properties of the object described by the sequence. This relational structure is constructed in a way that allows us to answer questions from a given language. The decision function is constructed by means of rule system. The rules are learned incrementally in a dialog with an expert. We also present an algorithm that implements the rule system and we apply it to real life problems.

Wojciech Jaworski
Sampling of Virtual Examples to Improve Classification Accuracy for Nominal Attribute Data

This paper presents a method of using virtual examples to improve the classification accuracy for data with nominal attributes. Most of the previous researches on virtual examples focused on data with numeric attributes, and they used domain-specific knowledge to generate useful virtual examples for a particularly targeted learning algorithm. Instead of using domain-specific knowledge, our method samples virtual examples from a naïve Bayesian network constructed from the given training set. A sampled example is considered useful if it contributes to the increment of the network’s conditional likelihood when added to the training set. A set of useful virtual examples can be collected by repeating this process of sampling followed by evaluation. Experiments have shown that the virtual examples collected this way can help various learning algorithms to derive classifiers of improved accuracy.

Yujung Lee, Jaeho Kang, Byoungho Kang, Kwang Ryel Ryu

Clustering

A Fuzzy-Possibilistic Fuzzy Ruled Clustering Algorithm for RBFNNs Design

This paper presents a new approach to the problem of designing Radial Basis Function Neural Networks (RBFNNs) to approximate a given function. The presented algorithm focuses in the first stage of the design where the centers of the RBFs have to be placed. This task has been commonly solved by applying generic clustering algorithms although in other cases, some specific clustering algorithms were considered. These specific algorithms improved the performance by adding some elements that allow them to use the information provided by the output of the function to be approximated but they did not add problem specific knowledge. The novelty of the new developed algorithm is the combination of a fuzzy-possibilistic approach with a supervising parameter and the addition of a new migration step that, through the generation of RBFNNs, is able to take proper decisions on where to move the centers. The algorithm also introduces a fuzzy logic element by setting a fuzzy rule that determines the input vectors that influence each center position, this fuzzy rule considers the output of the function to be approximated and the fuzzy-possibilistic partition of the data.

A. Guillén, I. Rojas, J. González, H. Pomares, L. J. Herrera, A. Prieto
A Partitive Rough Clustering Algorithm

Since rough sets were introduced by Pawlak about 25 years ago they have become a central part of soft computing. Recently Lingras presented a rough k-means clustering algorithm which assigns the data objects to lower and upper approximations of clusters. In our paper we introduce a rough k-medoids clustering algorithm and apply it to four different data sets (synthetic, colon cancer, forest and control chart data). We compare the results of these experiments to Lingras rough k-means and discuss the strengths and weaknesses of the rough k-medoids.

Georg Peters, Martin Lampart
A Zone-Based Method for Selecting Clusterheads in Wireless Sensor Networks

In this paper, we propose a method for selecting clusterheads, called ‘zone-based method for selecting clusterheads’, in a wireless sensor network to balance the amount of energy consumption over all nodes without generating any isolated sensor nodes. In our method, the network field is first divided into several zones, and each zone includes clusterheads in proportion to its area, which contributes to distributing clusterheads evenly over the network. Simulation results show that our method outperforms LEACH and PEGASIS in terms of network lifetime.

Kyungmi Kim, Hyunsook Kim, Kijun Han
An Agglomerative Hierarchical Clustering by Finding Adjacent Hyper-Rectangles

This paper proposes a new clustering method based on connecting adjacent hyper-rectangles. Our method searches a set of hyper-rectangles that satisfies the properties that (1) each hyper-rectangle covers some of the samples, and (2) each sample is covered by at least one of the hyper-rectangles. Then, a correction of connected hyper-rectangles is assumed to be a cluster. We apply agglomerative hierarchical clustering method to realize the clustering based on connecting adjacent hyper-rectangles. The effectiveness of the proposed method is shown by applying artificial data sets. This paper also considers on a way for speeding up of the agglomerative hierarchical clustering.

Noboru Takagi

Data Mining

Evaluating Learning Models for a Rule Evaluation Support Method Based on Objective Indices

We present an evaluation of a rule evaluation support method for post-processing of mined results with rule evaluation models based on objective indices in this paper. To reduce the costs of rule evaluation task, which is one of the key procedures in data mining post-processing, we have developed the rule evaluation support method with rule evaluation models, which are obtained with objective indices of mined classification rules and evaluations of a human expert for each rule. Then we have evaluated performances of learning algorithms for constructing rule evaluation models on the meningitis data mining as an actual problem, and ten rule sets from the ten kinds of UCI datasets as an article problem. With these results, we show the availability of our rule evaluation support method.

Hidenao Abe, Shusaku Tsumoto, Miho Ohsaki, Takahira Yamaguchi
Mining the Most Interesting Patterns from Multiple Phenotypes Medical Data

Mining the most interesting patterns from multiple phenotypes medical data poses a great challenge for previous work, which only focuses on bi-phenotypes (such as abnormal vs. normal) medical data. Association rule mining can be applied to analyze such dataset, whereas most rules generated are either redundancy or no sense. In this paper, we define two interesting patterns, namely VP (an acronym for “Vital Pattern”) and PP (an acronym for “Protect Pattern”), based on a statistical metric. We also propose a new algorithm called MVP that is specially designed to discover such two patterns from multiple phenotypes medical data. The algorithm generates useful rules for medical researchers, from which a clearly causal graph can be induced. The experiment results demonstrate that the proposed method enables the user to focus on fewer rules and assures that the survival rules are all interesting from the viewpoint of medical domain. The classifier build on the rules generated by our method outperforms existing classifiers.

Ying Yin, Bin Zhang, Yuhai Zhao, Guoren Wang
Risk Mining: Mining Nurses’ Incident Factors and Application of Mining Results to Prevention of Incidents

To err is human. How can we avoid near misses and achieve medical safety? From this perspective, we analyzed the nurses’ incident data by data mining with the ”concept of quality control” that near misses are produced by the system rather than individuals. Nurses’ incident data were collected during the 18 months at the emergency room. Significant rules (If-then rules) indicated that the medication errors are likely to occur when mental concentration is disrupted by interruption of work, etc. Based on the results of the analysis, the nurses’ medication check system was improved. During the last 6 months, the check system was put into effect. The frequency of the medication errors decreased to about one-twenties or less. It was considered that the data mining analysis contributes the decision support on the improvement of incidents.

Shusaku Tsumoto, Kimiko Matsuoka, Shigeki Yokoyama
Rule Quality Measures in Creation and Reduction of Data Rule Models

Properties of several rule quality measures are characterized in the paper. Possibilities of their application in algorithms of rules induction and reduction are presented. Influence of replacing rules accuracy with the Bayesian confirmation measure has been tested.

Marek Sikora

Evolutionary Computing

A Distributed Hybrid Heuristics of Mean Field Annealing and Genetic Algorithm for Load Balancing Problem

In this paper, we introduce a novel distributed Mean field Genetic algorithm called MGA for the resource allocation problems in MPI environments, which is an important issue in parallel processing. The proposed MGA is a hybrid algorithm of Mean Field Annealing (MFA) and Simulated annealing-like Genetic Algorithm (SGA). SGA uses the Metropolis criteria for state transition as in simulated annealing to keep the convergence property in MFA. The proposed MGA combines the benefit of rapid convergence property of MFA and the effective genetic operations of SGA. Our experimental results indicate that the composition of heuristic mapping methods improves the performance over the conventional ones in terms of communication cost, load imbalance and maximum execution time.

Chuleui Hong
Enhancing Global Search Ability of Quantum-Behaved Particle Swarm Optimization by Maintaining Diversity of the Swarm

Premature convergence, the major problem that confronts evolu-tionary algorithms, is also encountered with the Particle Swarm Optimization (PSO) algorithm. Quantum-behaved Particle Swarm (QPSO), a novel variant of PSO, is a global-convergence-guaranteed algorithm and has a better search ability than the original PSO. But like PSO and other evolutionary optimization techniques, premature in QPSO is also inevitable. The reason for premature convergence in PSO or QPSO is that the information flow between particles makes the diversity of the population decline rapidly. In this paper, we propose Diversity-Maintained QPSO (DMQPSO). Before describing the new method, we first introduce the origin and development of PSO and QPSO. DMQPSO, along with the PSO and QPSO, is tested on several benchmark functions for performance comparison. The experiment results show that the DMQPSO outperforms the PSO and QPSO in many cases.

Jun Sun, Wenbo Xu, Wei Fang
Identification and Speed Control of Ultrasonic Motors Based on Modified Immune Algorithm and Elman Neural Networks

An improved artificial immune algorithm with a dynamic threshold is presented in this paper. Numerical experiments show that compared with the genetic algorithm and the originally real-valued coding artificial immune algorithm, the improved algorithm possesses high speed of convergence and good performance of preventing the premature convergence. The proposed algorithm is employed to train the network structure, weights, initial inputs of the context units and self-feedback coefficient of the modified Elman network. A novel identifier and controller are constructed successively based on the proposed algorithm. A simulated dynamic system of the ultrasonic motor (USM) is considered as an example of a highly nonlinear system. The novel identifier and controller are applied to perform the speed identification and control of the ultrasonic motors. Numerical results show that both the identifier and controller based on the proposed algorithm possesses not only high convergent precision but also robustness to the external noise.

Qiao Zhang, Xu Xu, Yanchun Liang

Intelligent Information Systems

A Hybrid and Intelligent System for Predicting Lot Output Time in a Semiconductor Fabrication Factory

Predicting the output time of every lot in a semiconductor fabrication factory (wafer fab) is a critical task to the wafer fab. To further enhance the effectiveness of wafer lot output time prediction, a hybrid and intelligent system is constructed in this study. The system is composed of two major parts (a k-means classifier and a back-propagation-network regression) and has three intelligent features: incorporating the future release plan of the fab (look-ahead), example classification, and artificial neural networking. Production simulation is also applied in this study to generate test examples. According to experimental results, the prediction accuracy of the hybrid and intelligent system was significantly better than those of four existing approaches: BPN, case-based reasoning (CBR), FBPN, kM-BPN, by achieving a 9%~44% (and an average of 25%) reduction in the root-mean-squared-error (RMSE) over the comparison basis – BPN.

Toly Chen, Yu-Cheng Lin
Combining SOM and GA-CBR for Flow Time Prediction in Semiconductor Manufacturing Factory

Flow time of semiconductor manufacturing factory is highly related to the shop floor status; however, the processes are highly complicated and involve more than hundred of production steps. Therefore, a simulation model with the production process of a real wafer fab located in Hsin-Chu Science-based Park of Taiwan is built. In this research, a hybrid approach by combining Self-Organizing Map (SOM) and Case-Based Reasoning (CBR) for flow time prediction in semiconductor manufacturing factory is proposed. And Genetic Algorithm (GA) is applied to fine-tune the weights of features in the CBR model. The flow time and related shop floor status are collected and fed into the SOM for classification. Then, corresponding GA-CBR is selected and applied for flow time prediction. Finally, using the simulated data, the effectiveness of the proposed method (SGA-CBR) is shown by comparing with other approaches.

Pei-Chann Chang, Yen-Wen Wang, Chen-Hao Liu
Developing Intelligent Applications in Social E-Mail Networks

Both complex network and Web Intelligence (WI) are novel and promising research fields. This paper primarily discusses how to develop intelligent applications in social e-mail networks (SENs) with WI-related techniques. It gives a common architecture and discusses some important issues such as creating SENs automatically and SEN analysis for developing such applications. Also, this paper describes a full process of implementing an application via studying an illustrative example.

Wenbin Li, Ning Zhong, Y. Y. Yao, Jiming Liu, Chunnian Liu
Functional Extension of the RSDS System

The aim of this work is to describe a way of using of the bibliographical database system and to present new functional possibilities regarding its research aspect as well. The system has been designed and created in order to facilitate the access to descriptions of publications and applications related to the rough set theory and its use. This system has been fitted up with basic possibilities of database use. There are also special extensions of basic possibilities in the system, in particular:

– new versions of an advanced searching,

– information about co-authors for every author in the system,

– a module of a graph – statistical analysis of the content of the system,

– a module of a classification of scientific publications according to a projected classifier,

– an interactive map of the world showing who and where in the world works on the development of the rough set theory and its applications.

Zbigniew Suraj, Piotr Grochowalski
Hybrid Music Filtering for Recommendation Based Ubiquitous Computing Environment

Existing studies on music recommendation systems pose the problem of being incapable of proposing proper recommendations according to user conditions due to limited metadata obtained from users using a content-based filtering method. Although some studies have been conducted in recent years on recommendation systems employing a great amount of environmental information, they have been unable to satisfy information requested by the user. Thus, this study defines context information required to select music and proposes a hybrid filtering method that exploits a content-based filtering and collaborative filtering method in ubiquitous environments. In addition, this study developed a music recommendation system based on these filtering methods which significantly improved user satisfaction for music selection.

Jong-Hun Kim, Kyung-Yong Jung, Jung-Hyun Lee

Pattern Recognition and Image Processing

A Novel Color Image Watermarking Method Based on Genetic Algorithm and Hybrid Neural Networks

In this paper, a novel intensity adaptive color image watermarking algorithm based on genetic algorithm is presented. The adaptive embedding scheme in color image’s three component sub-images’ wavelet coefficients, which belong to texture-active regions, not only improves image quality, but also furthest enhances security and robustness of the watermarked image. Then a novel watermark recovering method is proposed based on hybrid neural networks, which enhance the performance of watermark system successfully. The experimental results show that our method is more flexible than traditional methods and successfully fulfills the compromise between robustness and image quality.

Yinghua Lu, Jialing Han, Jun Kong, Yulong Yang, Gang Hou
Calibration of Omnidirectional Camera by Considering Inlier Distribution

This paper presents a new self-calibration algorithm of omnidirectional camera from uncalibrated images by considering the inlier distribution. First, one parametric non-linear projection model of omnidirectional camera is estimated with the known rotation and translation parameters. After deriving projection model, we can compute an essential matrix of the camera with unknown motions, and then determine the camera positions. The standard deviations are used as a quantitative measure to select a proper inlier set. The experimental results showed that we can achieve a precise estimation of the omnidirectional camera model and extrinsic parameters including rotation and translation.

Yongho Hwang, Hyunki Hong
Modified Hough Transform for Images Containing Many Textured Regions

Images which have a lot of textured regions make the result of Hough transform (HT) very poor. This paper presents an improved HT that deals with such a textured image by diminishing the effect of noise edges and using weighted voting score. The method first eliminates the noise edges resulted from textured regions; then, the method casts votes for edges upon the accumulator array with weight score in accordance with the number of sequential votes. Our modified HT is efficient in that it produces important lines first such as verge of building, avoiding improper lines taken from the noise edges.

Yun-Seok Lee, Seung-Hun Yoo, Chang-Sung Jeong
Relative Color Polygons for Object Detection and Recognition

This paper proposes a new framework of the color model for outdoor scene image detection and recognition. This model enables us to manipulate easily the color of an image. Here, the concept of ‘relative color polygon’ for an object composed of uniform color regions is introduced on a 2D color space (

XY

space). Then the color similarity is defined using three kinds of parameters of the polygon: length and slope of every side and angle of adjacent sides. This paper addresses how to decide the color similarity by using the facts about color shifting on the

XY

space. The feasibility of the proposed framework has been confirmed through the experimental results using outdoor scene images taken under a great variety of various illumination conditions.

Thi Thi Zin, Sung Shik Koh, Hiromitsu Hama
Rough Set Based Image Segmentation of Video Sequences

We describe a rough set based segmentation method of video sequences. In a frame, there are many objects and a background. We represent theses objects and a background by regions. We consider that each object or background is a region. This region is represented by a rough set. Rough set is approximately representation of a crisp set. Our method consists of two phases. First phase is updating regions phase that consist three steps. First step is setting initial parameters. We use previous regions’ parameters to initial parameters. Second step is updating object regions. Updating is by hill climbing method with our evaluation function. Third step is updating a background region. The background region is updated by using other regions. In second phase, we make a segmentation map of frame using the regions. An ambiguous pixel’s label is decided using distance with regions.

Young Sub Song, Hang Joon Kim
Two Dimensional Laplacianfaces Method for Face Recognition

In this paper we propose the two dimensional Laplacianfaces method for face recognition. The new algorithm is developed based on the two techniques, i.e., locality preserved embedding and image based projection. The two dimensional Laplacianfaces method is not only computationally more efficient but also more accurate than the one dimensional Laplacianfaces method in extracting the facial features for human face authentication. Extensive experiments are performed to test and evaluate the new algorithm using the Yale and the AR face databases. The experimental results indicate that the two dimensional Laplacianfaces method significantly outperforms the existing two dimensional Eigenfaces, the two dimensional Fisherfaces and the one dimensional Laplacianfaces methods under the various settings of experiment conditions.

Niu Ben, Simon Chi Keung Shiu, Sankar Kumar Pal
Unsupervised Learning of Image Recognition with Neural Society for Clustering

New algorithm for partitional data clustering is presented,

Neural Society for Clustering

(NSC). Its creation was inspired by hierarchical image understanding, which requires unsupervised training to build the hierarchy of visual features. Existing clustering algorithms are not well-suited for this task, since they usually split natural groups of patterns into several parts (like k-means) or give crisp clustering.

Neurons comprising NSC may be viewed as a society of autonomous individuals, proceeding along the same simple algorithm, based on four principles: of locality, greediness, balance and competition. The same principles govern large groups of entities in economy, sociology, biology and physics. Advantages of NSC are demonstrated in experiment with visual data. The paper presents also a new method for objective and quantitative comparison of clustering algorithms, based on the notions of entropy and mutual information.

Marcin Wojnarski

Data Clustering: Algorithms and Applications (Organized Session)

A Framework for Unsupervised Selection of Indiscernibility Threshold in Rough Clustering

Indiscernibility threshold is a parameter in rough clustering that controls the global ability of equivalence relations for discriminating objects. During its second step, rough clustering iteratively refines equivalence relations so that the coarseness of classification of objects meets the given level of indiscernibility. However, as the relationships between this parameter and resultant clusters have not been studied yet, users should determine its value by trial and error. In this paper, we discuss the relationships between the threshold value of indiscernibility degree and clustering results, as a framework for automatic determination of indiscernibility threshold. The results showed that the relationships between indiscernibility degree and the number of clusters draw a globally convex but multi-modal curve, and the range of indiscernibility degree that yields best cluster validity may exist on a local minimum around the global one which generates single cluster.

Shoji Hirano, Shusaku Tsumoto
A Fuzzy Neighborhood Model for Clustering, Classification, and Approximations

A fuzzy neighborhood model for analyzing information systems having topological structures on occurrences of keywords is proposed and algorithms of clustering, classification and approximations similar to generalized rough sets are developed. Real applications include text mining and clustering of keywords on the web. An illustrative example is given.

Sadaaki Miyamoto, Satoshi Hayakawa
A Proposal for Comparison of Impression Evaluation Data Among Individuals by Using Clustering Method Based on Distributed Structure of Data

In the field of marketing, companies often carry out a questionnaire to consumers for grasping their impressions of products. Analyzing the evaluation data obtained from consumers enables us to grasp the tendency of the market and to find problems and/or to make hypotheses that are useful for the development of products. Semantic Differential (SD) method is one of the most useful methods for quantifying human-impressions to the objects. The purpose of this study is to develop a method for visualization of individual features in data. This paper proposes the clustering method based on Orthogonal Procrustes Analysis (OPA). The proposed method can cluster subjects among whom the distributed structures of the SD evaluation data are similar. The analysis by this method leads to discovery of majority/minority groups and/or groups which have unique features. In addition, it enables us to analyze the similarity/difference of objects and impression words among clusters and/or subjects by comparing the cluster centers and/or transformation matrices. This paper applies the proposed method to an actual SD evaluation data. It shows that this method can investigate the similar relationships among the objects in each group and compare the similarity/difference of impression words used for the evaluation of objects among subjects in the same cluster.

Shou Kuroda, Tomohiro Yoshikawa, Takeshi Furuhashi
Extending Microaggregation Procedures for Time Series Protection

Privacy is becoming a pervasive issue, as nowadays information is gathered by all kind of information systems.

In this paper we introduce a method for database protection in the case that the data under consideration is expressed in terms of time series. We propose the use of microaggregation for this purpose and extend standard microaggregation so that it works for this kind of data.

Jordi Nin, Vicenç Torra
Lattice-Valued Hierarchical Clustering for Analyzing Information Systems

A generalization of hierarchical clustering is proposed in which the dendrogram is replaced by clusters attached to a lattice diagram. Hence the method is called lattice-valued hierarchical clustering. Different algorithms of lattice-valued clustering are described with application to information systems in the form of tables studied in rough sets. A simple example is given whereby how the concept of the lattice-valued hierarchical clustering is related to classifications in rough sets is shown.

Sadaaki Miyamoto
Postsupervised Hard c-Means Classifier

Miyamoto

et al.

derived a hard clustering algorithms by defuzzifying a generalized entropy-based fuzzy c-means in which covariance matrices are introduced as decision variables. We apply the hard

c

-means (HCM) clustering algorithms to a postsupervised classifier to improve resubstitution error rate by choosing best clustering results from local minima of an objective function. Due to the nature of the prototype based classifier, the error rates can easily be improved by increasing the number of clusters with the cost of computer memory and CPU speed. But, with the HCM classifier, the resubstitution error rate along with the data set compression ratio is improved on several benchmark data sets by using a small number of clusters for each class.

Hidetomo Ichihashi, Katsuhiro Honda, Akira Notsu
Rule Induction Via Clustering Decision Classes

In this paper, we examine the effects of the application of LEM2 to a hierarchical structure of decision classes. We consider classification problems with multiple decision classes by nominal condition attributes. To such a problem, we first apply an agglomerative hierarchical clustering method to obtain a dendrogram of decision classes, i.e., a hierarchical structure of decision classes. At each branch of the dendrogram, we then apply LEM2 to induce rules inferring a cluster to which an object belongs. A classification system suitable for the proposed rule induction method is designed. By a numerical experiment, we compare the proposed methods with different similarity measure calculations, the standard application of LEM2 and a method with randomly generated dendrogram. As the result, we generally demonstrate the advantages of the proposed method.

Yoshifumi Kusunoki, Masahiro Inuiguchi
Several Formulations for Graded Possibilistic Approach to Fuzzy Clustering

Fuzzy clustering is a useful tool for capturing intrinsic structure of data sets. This paper proposes several formulations for soft transition of fuzzy memberships from probabilistic partition to possibilistic one. In the proposed techniques, the free memberships are given by introducing additional penalty term used in Possibilistic

c

-Means. The new features of the proposed techniques are demonstrated in several numerical experiments.

Katsuhiro Honda, Hidetomo Ichihashi, Akira Notsu, Francesco Masulli, Stefano Rovetta
Backmatter
Metadaten
Titel
Rough Sets and Current Trends in Computing
herausgegeben von
Salvatore Greco
Yutaka Hata
Shoji Hirano
Masahiro Inuiguchi
Sadaaki Miyamoto
Hung Son Nguyen
Roman Słowiński
Copyright-Jahr
2006
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-49842-1
Print ISBN
978-3-540-47693-1
DOI
https://doi.org/10.1007/11908029