Skip to main content

2010 | Buch

Rough Set and Knowledge Technology

5th International Conference, RSKT 2010, Beijing, China, October 15-17, 2010. Proceedings

herausgegeben von: Jian Yu, Salvatore Greco, Pawan Lingras, Guoyin Wang, Andrzej Skowron

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

TheInternationalConferenceonRoughSetandKnowledgeTechnology(RSKT) has been held every year since 2006. RSKT serves as a major forum that brings researchers and industry practitioners together to discuss and deliberate on fundamental issues of knowledge processing and management and knowled- intensive practical solutions in the current knowledge age. Experts from around the world meet to present state-of-the-art scienti?c results, to nurture academic and industrial interaction, and to promote collaborative research in rough sets and knowledge technology. The ?rst RSKT was held in Chongqing, China, f- lowed by RSKT 2007 in Toronto, Canada, RSKT 2008 in Chengdu, China and RSKT 2009 in Gold Coast, Australia. RSKT 2010, the 5th in the series, was held in Beijing, China, October 15–17, 2010. This volume contains 98 papers selected for presentation at RSKT 2010. Following the success of the previous conferences, RSKT 2010 continued the tradition of a very rigorous reviewing process. Every submission was reviewed byatleasttworeviewers.Moreover,RSKT2010invitedseveralareachairsto- pervise the review process of every submission. Most submissions were reviewed by three experts. The Program Committee members were deeply involved in a highly engaging selection process with discussions among reviewers and area chairs. When necessary, additional expert reviews were sought. As a result, only top-quality papers were chosen for presentation at the conference, including 49 regular papers (acceptance rate of 28%) and 25 short papers (acceptance rate of 14.3%). We would like to thank all the authors for contributing their best papers. Without their support, this conference would not have been possible.

Inhaltsverzeichnis

Frontmatter

Keynote Speech

Comparative Study on Mathematical Foundations of Type-2 Fuzzy Set, Rough Set and Cloud Model

Mathematical representation of a concept with uncertainty is one of foundations of Artificial Intelligence. The type-2 fuzzy set introduced by Mendel studies fuzziness of the membership grade of a concept. Rough set proposed by Pawlak defines an uncertain concept through two crisp sets. Cloud model, based on probability measure space, automatically produces random membership grades of a concept through a cloud generator. The three methods all concentrate on the essentials of uncertainty and have been applied in many fields for more than ten years. However, their mathematical foundations are quite different. The detailed comparative study on the three methods will discover the relationship in the betweens, and provide a fundamental contribution to Artificial Intelligence with uncertainty.

Deyi Li
Scientific Challenges in Contextual Advertising

Online advertising has been fueling the rapid growth of the Web that offers a plethora of free web services, ranging from search, email, news, sports, finance, and video, to various social network services. Such free services have accelerated the shift in people’s media time spend from offline to online. As a result, advertisers are spending more and more advertising budget online. This phenomenon is a powerful ecosystem play of users, publishers, advertisers, and ad networks. The rapid growth of online advertising has created enormous opportunities as well as technical challenges that demand computational intelligence. Computational Advertising has emerged as a new interdisciplinary field that studies the dynamics of the advertising ecosystem to solve challenging problems that rise in online advertising.

In this talk, I will provide a brief introduction to various forms of online advertising, including search advertising, contextual advertising, guaranteed and non-guaranteed display advertising. Then I will focus on the problem of contextual advertising, which is to find the best matching ads from a large ad inventory to a user in a given context (e.g., page view) to optimize the utilities of the participants in the ecosystem under certain business constraints (blocking, targeting, etc). I will present a problem formulation and describe scientific challenges in several key technical areas involved in solving this problem, including user understanding, semantic analysis of page content, user response prediction, online learning, ranking, and yield optimization.

Jianchang Mao
F-granulation, Generalized Rough Entropy and Pattern Recognition

The role of rough sets in uncertainty handling and granular computing is described. The significance of its integration with other soft computing tools and the relevance of rough-fuzzy computing, as a stronger paradigm for uncertainty handling, are explained. Different applications of rough granules and certain important issues in their implementations are stated. Three tasks such as class-depedendent rough-fuzzy granulation for classification, rough-fuzzy clustering and defining generalized rough sets for image ambiguity measures and anlysis are then addressed in this regard, explaining the nature and characteristics of granules used therein.

Merits of class dependentgranulation together with neighborhood rough sets for feature selection are demonstreted in terms of different classification indices. Significance of a new measure, called ”dispersion” of classification performance, which focuses on confused classes for higher level analysis, is explained in this regard. Superiority of rough-fuzzy clustering is illustrated for determining bio-bases (c-medoids) in encoding protein sequence for analysis. Generalized rough sets using the concept of fuzziness in granules and sets are defined both for equivalence and tolerance relations. These are followed by the definitions of entropy and different image ambiguities. Image ambiguity measures, which take into account both the fuzziness in boundary regions, and the rough resemblance among nearby gray levels and nearby pixels, have been found to be useful for various image analysis operations

The talk concludes with stating the future directions of research and challenges.

Sankar K. Pal
Knowledge Discovery about Preferences Using the Dominance-Based Rough Set Approach

The aim of scientific decision aiding is to give the decision maker a recommendation concerning a set of objects (also called alternatives, solutions, acts, actions, ... ) evaluated from multiple points of view considered relevant for the problem at hand and called attributes (also called features, variables, criteria, ... ). On the other hand, a rational decision maker acts with respect to his/her value system so as to make the best decision. Confrontation of the value system of the decision maker with characteristics of the objects leads to expression of preferences of the decision maker on the set of objects. In order to recommend the most-preferred decisions with respect to classification, choice or ranking, one must identify decision preferences. In this presentation, we review multi-attribute preference models, and we focus on preference discovery from data describing some past decisions of the decision maker. The considered preference model has the form of a set of

if..., then...

decision rules induced from the data. In case of multi-attribute classification the syntax of rules is:

if

performance of object a is better (or worse) than given values of some attributes

,

then

a belongs to at least (at most) given class

, and in case of multi-attribute choice or ranking:

if

object a is preferred to object b in at least (at most) given degrees with respect to some attributes

,

then

a is preferred to b in at least (at most) given degree

. To structure the data prior to induction of such rules, we use the Dominance-based Rough Set Approach (DRSA). DRSA is a methodology for reasoning about ordinal data, which extends the classical rough set approach by handling background knowledge about ordinal evaluations of objects and about monotonic relationships between these evaluations. We present DRSA to preference discovery in case of multi-attribute classification, choice and ranking, in case of single and multiple decision makers, and in case of decision under uncertainty and time preference. The presentation is mainly based on publications [1,2,3].

Roman Slowinski
Wikipedia and How to Use It for Semantic Document Representation

Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This talk focuses on the process of ”wikification”; that is, automatically and judiciously augmenting a plain-text document with pertinent hyperlinks to Wikipedia articlesas though the document were itself a Wikipedia article. I first describe how Wikipedia can be used to determine semantic relatedness between concepts. Then I explain how to wikify documents by exploiting Wikipedia’s internal hyperlinks for relational information and their anchor texts as lexical information. Data mining techniques are used throughout to optimize the models involved.

I will discuss applications to knowledge-based information retrieval, topic indexing, document tagging, and document clustering. Some of these perform at human levels. For example, on CiteULike data, automatically extracted tags are competitive with tag sets assigned by the best human taggers, according to a measure of consistency with other human taggers. All this work uses English, but involves no syntactic parsing, so the techniques are language independent.

Ian H. Witten
Granular Computing and Computational Complexity

Granular computing is to imitate human’s multi-granular computing strategy to problem solving in order to endow computers with the same capability. Its final goal is to reduce the computational complexity. To the end, based on the simplicity principle the problem at hand should be represented as simpler as possible. From structural information theory, it’s known that if a problem is represented at different granularities, the hierarchical description of the problem will be a simpler one. The simpler the representation the lower the computational complexity of problem solving should be. We presented a quotient space theory to multi-granular computing. Based on the theory a problem represented by quotient spaces will have a hierarchical structure. Therefore, the quotient space based multi-granular computing can reduce the computational complexity in problem solving. In the talk, we’ll discuss how the hierarchical representation can reduce the computational complexity in problem solving by using some examples.

Bo Zhang

Rough Sets and Computing Theory

Some Comparative Analyses of Data in the RSDS System

The paper concerns the analysis of the data included in the

Rough Set Database System

(called in short the RSDS system). The data is analyzed by means of statistical and graph-theoretical methods. The results of the analysis show interesting relations among the authors of publications whose descriptions are included in the system. The results may be useful for determining the structure of research groups related to the rough sets, the scientific interests of the members of those groups, mutual collaboration between groups, as well as for identifying the trends appearing in research.

Zbigniew Suraj, Piotr Grochowalski
Rough Temporal Vague Sets in Pawlak Approximation Space

The combination of temporal vague set theory and rough set theory is developed in this paper. The lower and upper approximation operators of a temporal vague set are constructed, which is partitioned by an indiscernibility relation in Pawlak approximation space, and the concept of rough temporal vague sets is proposed as a generalization of rough vague sets. Further properties associated with the lower and upper approximations of temporal vague sets are studied. Finally, the roughness measure of a temporal vague set is defined as an extension of the parameterized roughness measure of a vague set. Meantime, some properties of roughness measure are established.

Yonghong Shen
Poset Approaches to Covering-Based Rough Sets

Rough set theory is a useful and effective tool to cope with granularity and vagueness in information system and has been used in many fields. However, it is hard to get the reduct of a covering in rough sets. This paper attempts to get the reduct of a covering at a high speed in theory. It defines upset and downset based on a poset in a covering, studies the relationship between reducible element and downset, and obtains some good results such as sufficient and necessary condition about the reducible element in a covering.

Shiping Wang, William Zhu, Peiyong Zhu
1-vs-Others Rough Decision Forest

Bootstrap, boosting and subspace are popular techniques for inducing decision forests. In all the techniques, each single decision tree is induced in the same way as that for inducing a decision tree on the whole data, in which all possible classes are dealt with together. In such induced trees, some minority classes may be covered up by others when some branches grow or are pruned. For a multi-class problem, this paper proposes to induce individually the 1-vs-others rough decision trees for all classes, and finally construct a rough decision forest, intending to reduce the possible side effects of imbalanced class distribution. Since all training samples are reused to construct the rough decision trees for all classes, the method also tends to have the merits of bootstrap, boosting and subspace. Experimental results and comparisons on some hard gene expression data show the attractiveness of the method.

Jinmao Wei, Shuqin Wang, Guoying Wang
Knowledge Reduction in Random Incomplete Information Systems via Evidence Theory

Knowledge reduction is one of the main problems in the study of rough set theory. This paper deals with knowledge reduction in random incomplete information systems based on Dempster-Shafer theory of evidence. The concepts of random belief reducts and random plausibility reducts in random incomplete information systems are introduced. The relationships among the random belief reduct, the random plausibility reduct, and the classical reduct are examined. It is proved that, in a random incomplete information system, an attribute set is a random belief reduct if and only if it is a classical reduct, and a random plausibility consistent set must be a consistent set.

Wei-Zhi Wu
Knowledge Reduction Based on Granular Computing from Decision Information Systems

Efficient knowledge reduction in large inconsistent decision information systems is a challenging problem. Moreover, existing approaches have still their own limitations. To address these problems, in this article, by applying the technique of granular computing, provided some rigorous and detailed proofs, and discussed the relationship between granular reduct introduced and knowledge reduction based on positive region related to simplicity decision information systems. By using radix sorting and hash methods, the object granules as basic processing elements were employed to investigate knowledge reduction. The proposed method can be applied to both consistent and inconsistent decision information systems.

Lin Sun, Jiucheng Xu, Shuangqun Li
Pattern Classification Using Class-Dependent Rough-Fuzzy Granular Space

The article describes a new rough-fuzzy model for pattern classification. Here, class-dependent granules are formulated in fuzzy environment that preserve better class discriminatory information. Neighborhood rough sets (NRS) are used in the selection of a subset of granulated features that explore the local/contextual information from neighbor granules. The model thus explores mutually the advantages of class-dependent fuzzy granulation and NRS that is useful in pattern classification with overlapping classes. The superiority of the proposed model to other similar methods is demonstrated with both completely and partially labeled data sets using various performance measures. The proposed model learns well even with a lower percentage of training set that makes the system fast.

Sankar K. Pal, Saroj K. Meher, Soumitra Dutta
Generate (F, ε)-Dynamic Reduct Using Cascading Hashes

Dynamic reducts with large stability coefficients are good candidates for decision rules generation but it is time consuming to generate them. This paper presents an algorithm

dReducts

using a cascading hash function to generate (

F

,

ε

)-dynamic reducts. With the cascading hash function, an

F

-dynamic reduct can be generated in O(

m

2

n

) time with O(

mn

) space where

m

and

n

are total number of attributes and total number of instances of the table. Empirical results of generating (

F

,

ε

)-dynamic reducts using five of ten most popular UCI datasets are presented and they are compared to the Rough Set Exploration System (

RSES

).

Pai-Chou Wang
Incorporating Great Deluge with Kempe Chain Neighbourhood Structure for the Enrolment-Based Course Timetabling Problem

In general, course timetabling refers to assignment processes that assign events (courses) to a given rooms and timeslots subject to a list of hard and soft constraints. It is a challenging task for the educational institutions. In this study we employed a great deluge algorithm with kempe chain neighbourhood structure as an improvement algorithm. The Round Robin (RR) algorithm is used to control the selection of neighbourhood structures within the great deluge algorithm. The performance of our approach is tested over eleven benchmark datasets (representing one large, five medium and five small problems). Experimental results show that our approach is able to generate competitive results when compared with previous available approaches. Possible extensions upon this simple approach are also discussed.

Salwani Abdullah, Khalid Shaker, Barry McCollum, Paul McMullan
Ordered Weighted Average Based Fuzzy Rough Sets

Traditionally, membership to the fuzzy-rough lower, resp. upper approximation is determined by looking only at the worst, resp. best performing object. Consequently, when applied to data analysis problems, these approximations are sensitive to noisy and/or outlying samples. In this paper, we advocate a mitigated approach, in which membership to the lower and upper approximation is determined by means of an aggregation process using ordered weighted average operators. In comparison to the previously introduced vaguely quantified rough set model, which is based on a similar rationale, our proposal has the advantage that the approximations are monotonous w.r.t. the used fuzzy indiscernibility relation. Initial experiments involving a feature selection application confirm the potential of the OWA-based model.

Chris Cornelis, Nele Verbiest, Richard Jensen
On Attribute Reduction of Rough Set Based on Pruning Rules

Combining the concept of attribute dependence and attribute similarity in rough sets, the pruning ideas in the attribute reduction was proposed, the estimate method and fitness function in the processing of reduction was designed, and a new reduction algorithm based on pruning rules was developed, the complexity was analyzed, furthermore, many examples was given. The experimental results demonstrate that the developed algorithm can got the simplest reduction.

Hongyuan Shen, Shuren Yang, Jianxun Liu
Set-Theoretic Models of Granular Structures

Granular structure is one of the fundamental concepts in granular computing. Different granular structures reflect multiple aspects of knowledge and information, and depict the different characteristics of data. This paper investigates a family of set-theoretic models of different granular structures. The proposed models are particularly useful for concept formulation and learning. Some of them can be used in formal concept analysis, rough set analysis and knowledge spaces. This unified study of granular structures provides a common framework integrating these theories of granular computing.

Yiyu Yao, Duoqian Miao, Nan Zhang, Feifei Xu
A Robust Fuzzy Rough Set Model Based on Minimum Enclosing Ball

Fuzzy rough set theory was introduced as a useful mathematical tool to handle real-valued data. Unluckily, its sensitivity to noise has a great impact on the application in real world. So it is necessary to enhance the robustness of fuzzy rough sets. In this work, based on the minimum enclosing ball problem we introduce a robust model of fuzzy rough sets. In addition, we define a robust fuzzy dependency function and apply it to evaluate features corrupted by noise. Experimental results show that the new model is robust to noise.

Shuang An, Qinghua Hu, Daren Yu
Indiscernibility and Similarity in an Incomplete Information Table

A new method is proposed for interpreting and constructing relationships between objects in an incomplete information table. An incomplete information table is expressed as a family of complete tables. One can define an indiscernibility relation for each complete table and then get a family of indiscernibility relations of all complete tables. A pair of an indiscernibility and a similarity relation is constructed by the intersection and union of this family of indiscernibility relation. It provides a clear semantic interpretation for relationship between objects of an incomplete information table. In fact, it is a pair of bounds of the actual indiscernibility relation if all values in the incomplete table were known.

Renpu Li, Yiyu Yao
A New Fitness Function for Solving Minimum Attribute Reduction Problem

The problem of minimum attribute reduction is formally a nonlinearly constrained combinatorial optimization problem and has been proved to be NP-hard. A most commonly used approach for dealing with the problem is to first transform it into a fitness maximization problem over a Boolean space and then to solve the latter via stochastic optimization methods. However, existing fitness functions either fail to assure in theory the optimality equivalence between the original problem and the fitness maximization problem or are not quite adequate in terms of fitness evaluation. In this paper, a new fitness function that overcomes the drawbacks of the existing fitness functions is given. The optimality equivalence using the proposed fitness function is theoretically proved. Comparisons are made experimentally between the proposed fitness function and two other typical fitness functions by using two recent attribute reduction algorithms. Experimental results show that for each algorithm in test, the proposed fitness function outperforms the other two in terms of both the probability of finding a minimum reduct and the average length of the output reducts.

Dongyi Ye, Zhaojiong Chen, Shenglan Ma
Temporal Dynamics in Rough Sets Based on Coverings

Given a covering of a universe, we study how time evolution of the covering influences rough approximations and covering reduction. The definition of time evolution considers two cases: new objects are added to the system or the granules are changed.

Davide Ciucci
Data Classification Using Rough Sets and Naïve Bayes

Naïve Bayesian classifier is one of the most effective and efficient classification algorithms. The elegant simplicity and apparent accuracy of naive Bayes (NB) even when the independence assumption is violated, fosters the on-going interest in the model. Rough Sets Theory has been used for different tasks in knowledge discovery and successfully applied in many real-life problems. In this study we make use of rough sets ability, in discovering attributes dependencies, to overcome the NB un-practical assumption. We propose a new algorithm called Rough-Naive Bayes (RNB) that is expected to outperform other current NB variants. RNB is based on adjusting attributes’ weights based on their dependencies and contribution to the final decision. Experimental results show that RNB can achieve better performance than NB classifier.

Khadija Al-Aidaroos, Azuraliza Abu Bakar, Zalinda Othman
A Heuristic Reduction Algorithm in IIS Based on Binary Matrix

A binary discernibility matrix is presented in this paper, upon which a binary matrix-based heuristic reduction algorithm in incomplete information system(IIS) is proposed. In the proposed algorithm, the problem of finding an attribute reduction is converted to the problem of searching a set of binary matrices that can cover the objective binary matrix. The heuristic function in the proposed heuristic reduction algorithm is defined by a discernibility matrix associated with each condition attribute, which denotes the classification significance of the condition attribute. In the proposed heuristic reduction algorithm, attribute reduct is constructed by adding attributes in the sequence of attribute significance. An example of incomplete information system is presented to illustrate the algorithm and its validity.The algorithm is proved to be effective based on an illustrative example.

Huaxiong Li, Xianzhong Zhou, Meimei Zhu
Generalized Distribution Reduction in Inconsistent Decision Systems Based on Dominance Relations

By incorporating dominance principle in inconsistent decision systems based on dominance relations, two new types of distribution reductions are proposed, i.e., generalized distribution reduction and generalized maximum distribution reduction, and their properties and relationship are also discussed. The corresponding generalized distribution discernibility matrix is then defined to provide a convenient computation method to obtain the generalized distribution reductions. The validation of this method is showed by both theoretical proofs and illustrative examples.

Yan Li, Jin Zhao, Na-Xin Sun, Sankar Kumar Pal
Towards Multi-adjoint Property-Oriented Concept Lattices

In this paper we present some properties related to adjoint triples when we consider dual supports. These results are used in order to generalize the classical property oriented concept lattices, which itself embeds rough set theory. Specifically, we define a fuzzy environment based on the philosophy of the multi-adjoint paradigm, which is related to the multi-adjoint concept lattice. As a consequence, we can move the properties from one to another.

Jesús Medina
Extension of Covering Approximation Space and Its Application in Attribute Reduction

The concept of the complement of a covering is introduced firstly, and then the complement space and extended space of a covering approximation space is defined based on it. It is proved that a covering approximation space will generate the same covering lower and upper approximations as its complement space and extended space if the covering is degenerated to a partition. Moreover, the extended space of a covering approximation space often generate a bigger covering lower approximation or smaller covering upper approximation than itself. Through extending each covering in a covering decision system, the classification ability of each covering is improved. Thus, a heuristic reduction algorithm is developed to eliminate some coverings in a covering decision system without decreasing the classification ability of the system for decision. Theoretic analysis and example illustration indicate that this algorithm can get shorter reduction than other algorithms.

Guoyin Wang, Jun Hu
A New Extended Dominance Relation Approach Based on Probabilistic Rough Set Theory

A new extended dominance relation is proposed for multi-attribute decision making problems with preference and incomplete information. First, the concept of new extended dominance relation based on the possibility of dominance relation between two objects is proposed. It substitutes the possibility of dominance relation for the conditions of limited extended dominance relation. In addition, it restricts that attribute values of two objects are missing in the same attribute. Through using probabilistic rough set theory, the threshold for the new extended dominance relation is estimated. It is decided by the loss regarding the risk or cost of those classification actions with respect to different states. Finally, the effectiveness of the new extended dominance relation is validated by experimental studies.

Decui Liang, Simon X. Yang, Chaozhe Jiang, Xiangui Zheng, Dun Liu
An Equivalent Form of Rough Logic System RSL

Firstly, based on regular double Stone algebra, a new kind of rough logic system is established, and obtain some properties in this paper.Secondly,we prove that it is one equivalent form of rough logic system RSL.

Yingchao Shao, Zhongmin Xie, Keyun Qin

Fuzzy Sets

Conceptual Reduction of Fuzzy Dual Concept Lattices

In this paper we discuss the conceptual reduction of fuzzy dual concept lattices. Three pairs of operators in a fuzzy formal context are introduced. Based on the proposed operators,we present three types of variable threshold dual concept lattices. The properties and the relations of them are discussed. The result shows that the number of concepts in variable threshold dual concept lattices is less than that in fuzzy dual concept lattices, and the important concepts are preserved.

Xiao-Xue Song, Wen-Xiu Zhang, Qiang Zhao
Qualitative Approximations of Fuzzy Sets and Non-classical Three-Valued Logics (I)

(0,1)-Qualitative approximations of fuzzy sets are studied by using the core and support of a fuzzy set. This setting naturally leads to three disjoint regions and an analysis based on a three-valued logic. This study combines both an algebra view and a logic view. From the algebra view, the mathematical definition of a (0,1)-approximation of fuzzy sets are given, and algebraic operations based on various

t

-norms and fuzzy implications are established. From the logic view, a non-classical three-valued logic is introduced. Corresponding to this new non-classical three-valued logic, the related origins of

t

-norms and fuzzy implications are examined.

Xiaohong Zhang, Yiyu Yao, Yan Zhao
Qualitative Approximations of Fuzzy Sets and Non-classical Three-Valued Logics (II)

(

α

,

β

)-qualitative approximations of fuzzy sets are studied by using a pair of subsets defined by an

α

-cut and a strong

β

-cut. This setting naturally leads to three disjoint regions and an analysis based on a three-valued logic. This study combines both an algebra view and a logic view. From the algebra view, the mathematical definition of an (

α

,

β

)-approximation of fuzzy sets is given, and algebraic operations based on various

t

-norms and fuzzy implications are established. From the logic view, two non-classical three-valued logics are introduced. Corresponding to these new non-classical three-valued logics, the related origins of

t

-norms and fuzzy implications are examined.

Xiaohong Zhang, Yiyu Yao, Yan Zhao
Implication Operator of Linguistic Truth-Valued Intuitionistic Fuzzy Lattice

We construct linguistic truth-valued intuitionistic fuzzy lattice b based on linguistic truth-valued lattice implication algebra to deal with linguistic truth-valued. The proposed system can better express both comparable and incomparable information. Also it can deal with both positive and negative evidences which are represented by linguistic truth values at the same time during information processing system. This paper discusses the intuitionistic fuzzy implication operator on the linguistic truth-valued intuitionistic fuzzy lattice and its properties.

Chunying Guo, Fengmei Zhang, Li Zou, Kaiqi Zou
Robust Granular Neural Networks, Fuzzy Granules and Classification

We introduce a robust granular neural network (RGNN) model based on the multilayer perceptron using back-propagation algorithm for fuzzy classification of patterns. We provide a development strategy of the network mainly based upon the input vector, linguistic connection weights and target vector. While the input vector is described in terms of fuzzy granules, the target vector is defined in terms of class membership values and zeros. The connection weights among nodes of RGNN are in terms of linguistic variables, whose values are updated by adding two linguistic hedges. The updated linguistic variables are called generalized linguistic variables. The node functions of RGNN are defined in terms of linguistic arithmetic operations. We present the experimental results on several real life data sets. Our results show that the classification performance of RGNN is superior to other similar type of networks.

G. Avatharam, Sankar K. Pal
Perturbed Iterative Approximation of Common Fixed Points on Nonlinear Fuzzy and Crisp Mixed Family Operator Equation Couples in Menger PN-Spaces

In this paper, based on the concept of probabilistic (

ϕ

,

ψ

)-contractor couple introduced by Mihet, a new class of nonlinear operator equation couples with a mixed family of fuzzy and crisp operator equations in Menger probabilistic normed spaces (briefly, Menger PN-spaces) is introduced and studied. Further, some new iterative algorithms are constructed, and the existence of solutions for the nonlinear operator equation couples and the convergence of iterative sequences generated by the algorithms under a larger class of

t

-norms and joint orbitally complete conditions are discussed.

Heng-you Lan, Tian-xiu Lu, Huang-lin Zeng, Xiao-hong Ren
Improving the Learning of Recurring Concepts through High-Level Fuzzy Contexts

In data stream classification the problem of recurring concepts is a special case of concept drift where the underlying concepts may reappear. Several methods have been proposed to learn in the presence of concept drift, but few consider recurring concepts and context integration. To address these issues, we presented a method that stores previously learned models along with context information of that learning period. When concepts recur, the appropriate model is reused, avoiding relearning a previously seen concept. In this work, in order to model the vagueness and uncertainty associated with context, we propose the inference of high-level fuzzy contexts from fuzzy logic rules, where the conditions result from fuzzified context inputs. We also present the changes required for our method to deal with this new representation, extending the approach to handle uncertain contexts.

João Bártolo Gomes, Ernestina Menasalvas, Pedro A. C. Sousa

Knowledge Technology

A Frequent Pattern Mining Method for Finding Planted (l, d)-motifs of Unknown Length

Identification and characterization of gene regulatory binding motifs is one of the fundamental tasks toward systematically understanding the molecular mechanisms of transcriptional regulation. Recently, the problem has been abstracted as the challenge planted (

l

,

d

)-motif problem. Previous studies have developed numerous methods to solve the problem. But most of methods need to specify the length

l

of a motif in advance. In this study, we present an exact and efficient algorithm, called Apriori-Motif, without given

l

. The algorithm uses breadth first search and prunes the search space quickly by the downward closure property used in Apriori, a classical algorithm of frequent pattern mining. Empirical study shows that Apriori-Motif is better than some existing methods.

Caiyan Jia, Ruqian Lu, Lusheng Chen
A Quick Incremental Updating Algorithm for Computing Core Attributes

Computing core attributes is one of key problems of rough set theory. Many researchers proposed lots of algorithms for computing core. Unfortunately, most of them are designed for static databases. However, many real datasets are dynamic. In this paper, a quick incremental updating core algorithm is proposed, which only allies on the updating parts of discernibility matrix and does not need to store, re-compute and re-analyze discernibility matrix, when new objects are added to decision table. Both of theoretical analysis and experimental results show that the algorithm is effective and efficient.

Hao Ge, Chuanjian Yang, Wanlian Yuan
Using Lexical Ontology for Semi-automatic Logical Data Warehouse Design

Data Warehouse has become a very important tool for data analysis but the conversion from Operational Database to Data Warehouse is very complex. This paper presents a new method to design a data warehouse multidimensional model based on supply driven framework. We propose a semi-automatic tool using lexical ontology as the knowledge domain. The tool is able to identify fact and dimensional tables using the knowledge domain. Once fact table is identified, the following step is to generate Data Warehouse multidimensional model with minimal user interaction. The feasibility of the proposed method is demonstrated by a prototype using WordNet as knowledge domain.

Mior Nasir Mior Nazri, Shahrul Azman Noah, Zarinah Hamid
Likelihood-Based Sampling from Databases for Rule Induction Methods

This paper introduces the idea of log-likelihood ratio to measure the similarity between generated training samples and original tracing samples. The ratio is used as a test statistic to determine whether the statistical information of generated training samples(

S

k

) is almost equivalent to that of original training samples(

S

0

), denoted by

S

0

 ≃ 

S

k

. If the test statistic obtained rejects the hypothesis

S

0

 ≃ 

S

k

, then these samples are abandoned. Otherwise, the generated samples are accepted and rule induction methods or statistical methods are applied. This method was evaluated to three medical domains. The results show that the proposed method selects training samples which reflect the statistical characteristics of the original training samples although the performance with small samples is not so good.

Shusaku Tsumoto, Shoji Hirano, Hidenao Abe
Residual Analysis of Statistical Dependence in Multiway Contingency Tables

A Pearson residual is defined as the residual between actual values and expected ones of each cell in a contingency table. This paper shows that this residual is represented as linear sum of determinants of 2 ×2, which suggests that the geometrical nature of the residuals can be viewed from grasmmanian algebra.

Shusaku Tsumoto, Shoji Hirano
A Note on the Effect of Knowledge Refinement on Bag Structures

In this paper, the author discusses the notion of bags and describes a conceptual model of an information system where the effect of knowledge refinement on bag structure can be clearly observed. In this context, the indiscernibility relation is defined over one or more attribute(s) of a collection of objects of the same type. Here the author further define attributive indiscernibility relations and their types.

Kankana Chakrabarty
A Belief Structure for Reasoning about Knowledge

A logic-based belief structure is proposed for knowledge representation and reasoning. This structure is semantically different from the standard Kripke structure. It is demonstrated that such a representation of knowledge is particularly useful in a multi-agent environment.

The proposed model is also suitable for dealing with inconsistent and incomplete information and provides a natural measure of uncertainty for the knowledge modal operators.

S. K. M. Wong, Nasser Noroozi
Research on Mapping Mechanism of Learning Expression

In this paper, we propose a description of Machine Learning System (MLS) which is based on the category theory in order to study mapping mechanism of the learning expression. And a detailed instance is provided. This is that Decision Tree Learning (DTL) can be denoted based on category theory. In addition, we prove that the machine learning system is indeed a category, and propose the learning expression and the mapping mechanism of the learning expression based on category theory. Also we describe the proof and verify the mechanism.

Lili Zhou, Fanzhang Li
Linking Open Spatiotemporal Data in the Data Clouds

The Linked Data Clouds are the best expression and realization of the Semantic Web vision to date. The Data Clouds can significantly benefit both the AI and Semantic Web communities by enabling new classes of tasks and enhancing reasoning, data mining and knowledge discovery applications. There are various forms of spatiotemporal data in the Linked Data Clouds. To efficiently utilize these spatiotemporal data, schema heterogeneity problem must be resolved. A hierarchical spatiotemporal model is proposed to give a conceptual description of different spatiotemporal dataset of Linked Data Clouds. The model can support schema level mappings and convey relationships between concepts of different datasets at the schema level. The hierarchical spatiotemporal model contains three layers: an meta level for abstract level spacetime knowledge; an schema level for well-known models in spatial and temporal reasoning - Allen’s Interval Algebra in temporal reasoning and the RCC model in spatial reasoning; and an instantiations level to provide systematic mapping and formal descriptions of the various ground spatiotemporal statements.

He Hu, Xiaoyong Du
Review of Software Security Defects Taxonomy

an organized list of actual defects can be useful for software security test (SST). In order to target their technology on a rational basis, it would be useful for security testers to have available a taxonomy of software security defects organizing the problem space. Unfortunately, the only existing suitable taxonomies are mostly for tool-builders and software designers, or based on vulnerabilities and security errors, and do not adequately represent security defects that are found in modern software. In our work, we have reviewed the traditional software security errors or vulnerabilities taxonomies. Based on analyzing in its target, motivation and insufficiency, we have compared 9 kinds of taxonomies, which would be useful for defects based software security testing.

Zhanwei Hui, Song Huang, Zhengping Ren, Yi Yao
A New Hybrid Method of Generation of Decision Rules Using the Constructive Induction Mechanism

Our research is devoted to develop a new method of generation of a set of decision rules. This method is compiled using two different mechanisms. The first one is based on applying a new constructive induction algorithm to the investigated dataset. The belief networks are used in this algorithm. The aim is to find the most important descriptive attribute that is calculated on the basis of other attributes. The second part of the presented method constitutes the improvement algorithm that is used in an optimization process of a gathered rule set. The results of our research contain the comparison of classification efficiency using several datasets.

Wiesław Paja, Krzysztof Pancerz, Mariusz Wrzesień

Intelligent Information Processing

An Effective Principal Curves Extraction Algorithm for Complex Distribution Dataset

This paper proposes a new method for finding principal curves from complex distribution dataset. Motivated by solving the problem, which is that existing methods did not perform well on finding principal curve in complex distribution dataset with high curvature, high dispersion and self-intersecting, such as spiral-shaped curves, Firstly, rudimentary principal graph of data set is created based on the thinning algorithm, and then the contiguous vertices are merged. Finally the fitting-and-smoothing step introduced by Kégl is improved to optimize the principal graph, and Kégl’s restructuring step is used to rectify imperfections of principal graph. Experimental results indicate the effectiveness of the proposed method on finding principal curves in complex distribution dataset.

Hongyun Zhang, Duoqian Miao, Lijun Sun, Ying Ye
Parallel Reducts Based on Attribute Significance

In the paper, we focus on how to get parallel reducts. We present a new method based on matrix of attribute significance, by which we can get parallel reduct as well as dynamic reduct. We prove the validity of our method in theory. The time complex of our method is polynomial. Experiments show that our method has advantages of dynamic reducts.

Dayong Deng, Dianxun Yan, Jiyi Wang
A Rough Sets Approach to User Preference Modeling

User preference modeling is one of the challenging issues in intelligent information system. Extensive researches have been performed to automatically analyze user preference and utilize it. But one problem still remains: All of them could not deal with semantic preference representation and uncertain data at the same time. To overcome this problem, this paper proposes a rough set approach to user preference modeling. A family of atomic preference granules supporting semantic in knowledge space and two operators, called vertical aggregation operator ⊙ and horizon combination operator ⊕ respectively, are defined to represent user preference. Based on this, a rough granular computing framework is also constructed for user preference analyzing. Experimental results show that the proposed model plays well in recommendation tests.

Siyuan Jing, Kun She
A Tool for Study of Optimal Decision Trees

The paper describes a tool which allows us for relatively small decision tables to make consecutive optimization of decision trees relative to various complexity measures such as number of nodes, average depth, and depth, and to find parameters and the number of optimal decision trees.

Abdulaziz Alkhalid, Igor Chikalov, Mikhail Moshkov
Automatic Part of Speech Tagging for Arabic: An Experiment Using Bigram Hidden Markov Model

Part Of Speech (POS) tagging is the ability to computationally determine which POS of a word is activated by its use in a particular context. POS tagger is a useful preprocessing tool in many natural languages processing (NLP) applications such as information extraction and information retrieval. In this paper, we present the preliminary achievement of Bigram Hidden Markov Model (HMM) to tackle the POS tagging problem of Arabic language. In addition, we have used different smoothing algorithms with HMM model to overcome the data sparseness problem. The Viterbi algorithm is used to assign the most probable tag to each word in the text. Furthermore, several lexical models have been defined and implemented to handle unknown word POS guessing based on word substring i.e. prefix probability, suffix probability or the linear interpolation of both of them. The average overall accuracy for this tagger is 95.8.

Mohammed Albared, Nazlia Omar, Mohd. Juzaiddin Ab Aziz, Mohd Zakree Ahmad Nazri
Application of Rough Sets Theory in Air Quality Assessment

This paper analyses rough sets approaches to air quality assessment in given locality of the Czech Republic (CR). Original data for modeling we obtained from the daily observation of air polluting substances concentrations in town Pardubice. Two applications were used for decision rules computation and achieved results are compared and analysed. Output of this paper is the proposal how to assign an air quality index (AQI) to the selected locality, on the basis of various attributes.

Pavel Jirava, Jiri Krupka, Miloslava Kasparova
An Interactive Approach to Outlier Detection

In this paper we describe an interactive approach for finding outliers in big sets of records, such as collected by banks, insurance companies, web shops. The key idea behind our approach is the usage of an easy-to-compute and easy-to-interpret

outlier score

function. This function is used to identify a set of potential outliers. The outliers, organized in clusters, are then presented to a domain expert, together with some context information, such as characteristics of clusters and distribution of scores. Consequently, they are analyzed, labelled as

non-explainable

or

explainable

, and removed from the data. The whole process is iterated several times, until no more interesting outliers can be found.

R. M. Konijn, W. Kowalczyk

Health Informatics and Biometrics Authentication

Rules for Ontology Population from Text of Malaysia Medicinal Herbs Domain

The primary goal of ontology development is to share and reuse domain knowledge among people or machines. This study focuses on the approach of extracting semantic relationships from unstructured textual documents related to medicinal herb from websites and proposes a lexical pattern technique to acquire semantic relationships such as synonym, hyponym, and part-of relationships. The results show of nine object properties (or relations) and 105 lexico-syntactic patterns have been identified manually, including one from the Hearst hyponym rules. The lexical patterns have linked 7252 terms that have the potential as ontological terms. Based on this study, it is believed that determining the lexical pattern at an early stage is helpful in selecting relevant term from a wide collection of terms in the corpus. However, the relations and lexico-syntactic patterns or rules have to be verified by domain expert before employing the rules to the wider collection in an attempt to find more possible rules.

Zaharudin Ibrahim, Shahrul Azman Noah, Mahanem Mat Noor
Gait Recognition Based on Outermost Contour

Gait recognition aims to identify people by the way they walk. In this paper, a simple but effective gait recognition method based on Outermost Contour is proposed. For each gait image sequence, an adaptive silhouette extraction algorithm is firstly used to segment the images and a series of postprocessing is applied to the silhouette images to obtain the normalized silhouettes with less noise. Then a novel feature extraction method based on Outermost Contour is proposed. Principal Component Analysis (PCA) and Multiple Discriminant Analysis (MDA) are adopted to reduce the dimensionality of the feature vectors and to optimize the class separability of different gait image sequences simultaneously. Two simple pattern classification methods are used on the low-dimensional eigenspace for recognition. Experimental results on a gait database of 100 people show that the accuracy of our algorithm achieves 97.67%.

Lili Liu, Yilong Yin, Wei Qin
Pseudofractal 2D Shape Recognition

From the beginning of fractal discovery they found a great number of applications. One of those applications is fractal recognition. In this paper we present some of the weaknesses of the fractal recognition methods and how to eliminate them using the pseudofractal approach. Moreover we introduce a new recognition method of 2D shapes which uses fractal dependence graph introduced by Domaszewicz and Vaishampayan in 1995. The effectiveness of our approach is shown on two test databases.

Krzysztof Gdawiec
Classification of MMPI Profiles of Patients with Mental Disorders – Experiments with Attribute Reduction and Extension

In the research presented in the paper we try to find efficient methods for classification of MMPI profiles of patients with mental disorders. Each profile is described by a set of values of thirteen attributes (scales). Patients can be classified into twenty categories concerning nosological types. It is possible to improve classification accuracy by reduction or extension of the number of attributes with relation to the original data table. We test several techniques of reduction and extension. Experiments show that the proposed attribute extension approach improves classification accuracy, especially in the case of discretized data.

Jerzy Gomuła, Krzysztof Pancerz, Jarosław Szkoła
Automatic 3D Face Correspondence Based on Feature Extraction in 2D Space

This present study proposes to solve 3D faces correspondence in 2D space. Texture maps of 3D models are generated at first by unwrapping 3D faces to 2D space. Following this, we build planar templates based on the 2D average shape computed by a group of annotated texture map. In this paper, landmarks on the unwrapped texture images are located automatically and the final correspondence is built according to the templates. Experimental results show that the presented method is stable and can achieve good matching accuracy.

Xun Gong, Shuai Cao, Xinxin Li, Ping Yuan, Hemin Zou, Chao Ye, Junyu Guo, Chunyao Wang
The Neuropathological Diagnosis of the Alzheimer’s Disease under the Consideration of Verbal Decision Analysis Methods

The Alzheimer’s disease incidence and prevalence double every five years with a prevalence of 40% among people with more than 85 years of age, and there is a great challenge in identifying the pathology in its earliest stages. The main focus of the work is to aid in the decision making on the diagnosis of the Alzheimer’s disease, which will be made applying the method ORCLASS and the Aranaú Tool. The modeling and evaluation processes were conducted based on bibliographic sources and on the information given by a medical expert.

Isabelle Tamanini, Plácido Rogério Pinheiro, Mirian Calíope D. Pinheiro
Autonomous Adaptive Data Mining for u-Healthcare

Ubiquitous healthcare requires intelligence in order to be able to react to different patients needs. The context and resources con- straints of the ubiquitous devices demand a mechanism able to estimate the cost of the data mining algorithm providing the intelligence. The per- formance of the algorithm is independent of the semantics, this is to say, knowing the input of an algorithm the performance can be calculated. Under this assumption we present formalization of a mechanism able to estimate the cost of an algorithm in terms of efficacy and efficiency. Further, an instantiation of the mechanism for an application predicting glucose level for diabetic patients is presented.

Andrea Zanda, Santiago Eibe, Ernestina Menasalvas
Fast Iris Localization Based on Improved Hough Transform

Iris is a new biometric emerging in recent years. Iris identification is gradually applied to a number of important areas because of its simplicity, fast identification and low error recognition rate. Typically, an iris recognition system includes four parts: iris localization, feature extraction, coding and recognition. Among it, iris localization is a critical step. In the paper, a fast iris localization algorithm based on improved Hough transform was proposed. First, the algorithm builds gray histogram of iris image to analyze the gray threshold of the iris boundary. Then takes the pupil image binarization, using corrosion and expansion or region growing to remove noise. As a result, it obtains the radius of the inner edge. Then, we conduct iris location based on Hough transform according to the geometrical feature and gray feature of the human eye image. By narrowing searching scope, localization speed and iris localization accuracy are improved. Besides, it has better robustness for real-time system. Experimental results show that the proposed method is effective and encouraging.

Lu Wang, Gongping Yang, Yilong Yin
Face Recognition Using Consistency Method and Its Variants

Semi-supervised learning has become an active area of recent research in machine learning. To date,many approaches to semi-supervised learning are presented. In this paper,Consistency method and its some variants are deeply studied. The proof about the important condition for convergence of consistency method is given in detail. Moreover,we further study the validity of some variants of consistency method. Finally we conduct the experimental study on the parameters involved in consistency method to face recognition. Meanwhile, the performance of Consistency method and its some variants are compared with that of support vector machine supervised learning methods.

Kai Li, Nan Yang, Xiuchen Ye

Neural Networks

Clonal Selection Algorithm for Learning Concept Hierarchy from Malay Text

Concept hierarchy is an integral part of ontology which is the backbone of the Semantic Web. This paper describes a new hierarchical clustering algorithm for learning concept hierarchy named Clonal Selection Algorithm for Learning Concept Hierarchy, or CLONACH. The proposed algorithm resembles the CLONALG. CLONACH’s effectiveness is evaluated on three data sets. The results show that the concept hierarchy produced by CLONACH is better than the agglomerative clustering technique in terms of taxonomic overlaps. Thus, the CLONALG based algorithm has been regarded as a promising technique in learning from texts, in particular small collection of texts.

Mohd Zakree Ahmad Nazri, Siti Mariyam Shamsuddin, Azuraliza Abu Bakar
Action Potential Classification Based on LVQ Neural Network

Neural action potential classification is the prerequisite condition of further neural information process, but there are difficulties in accurate action potential classification due to the existence of great amount of noise. In this paper, we propose a method of action potential classification based on Learning Vector Quantization (LVQ) network. In the experimental stage, the performance of the presented system was tested at various signal-to-noise ratio levels based on synthetic data. The results show that the proposed action potential classification method is effective. The proposed method supplies a new thought for action potential classification problem.

Jian-Hua Dai, Qing Xu, Mianrui Chai, Qida Hu
Back Propagation Approach for Semi-supervised Learning in Granular Computing

Zadeh proposes that there are three basic concepts underlying human cognition: granulation, organization and causation and that a granule is a clump of points (objects) drawn together by indistinguishability, similarity, proximity or functionality. Tolerance relation can describe the concept of Granular systems. In this paper, a novel definition of Granular System(GS), which is described by metric function under the framework of tolerance relation, is presented, concepts are created upon GS, and we introduce semi-supervised learning into the Granular computing for concepts creating. For this purpose, a novel back propagation approach is developed for concepts learning. The experiment shows that the new BP is better than traditional EM algorithm when samples do not come from a random source, which has the density we want to estimate.

Hong Hu, Weimin Liu, Zhongzhi Shi

Complex Networks

WebRank: A Hybrid Page Scoring Approach Based on Social Network Analysis

Applying the centrality measures from social network analysis to score web pages may well represent the essential role of pages and distribute their authorities in a web social network with complex link structures. To effectively score the pages, we propose a hybrid page scoring algorithm, called WebRank, based on the PageRank algorithm and three centrality measures including degree, betweenness, and closeness. The basis idea of WebRank is that: (1) use PageRank to accurately rank pages, and (2) apply centrality measures to compute the importance of pages in web social networks. In order to evaluate the performance of WebRank, we develop a web social network analysis system which can partition web pages into distinct groups and score them in an effective fashion. Experiments conducted on real data show that WebRank is effective at scoring web pages with less time deficiency than centrality measures based social network analysis algorithm and PageRank.

Shaojie Qiao, Jing Peng, Hong Li, Tianrui Li, Liangxu Liu, Hongjun Li
Superficial Method for Extracting Social Network for Academics Using Web Snippets

Social network analysis (SNA) has become one of the main themes in the Semantic Web agenda. The use of web is steadily gaining ground in the study of social networks. Few researchers have shown the possibility of extracting social network from the Web via search engine. However to get a rich and trusted social network from such an approach proved to be difficult. In this paper we proposed an Information Retrieval (IR) driven method for dealing with the heterogeneity of features in the Web. We demontrate the possibility of exploiting features in Web snippets returned by search engines for disambiguating entities and building relations among entities during the process of extracting social networks. Our approach has shown the capacity to extract underlying strength relations which are beyond recognition using the standard co-occurrence analysis employed by many research.

Mahyuddin K. M. Nasution, Shahrul Azman Noah
Research of Spatio-temporal Similarity Measure on Network Constrained Trajectory Data

Similarity measure between trajectories is considered as a pre-processing procedure of trajectory data mining. A lot of shaped-based and time-based methods on trajectory similarity measure have been proposed by researchers recently. However, these methods can not perform very well on constrained trajectories in road network because of the inappropriateness of Euclidean distance. In this paper, we study spatio-temporal similarity measure for trajectories in road network. We partition constrained trajectories on road network into segments by considering both the temporal and spatial properties firstly, then propose a spatio-temporal similarity measure method for trajectory similarity analysis. Experimental results exhibit the performance of the proposed methods and its availability used for trajectory clustering.

Ying Xia, Guo-Yin Wang, Xu Zhang, Gyoung-Bae Kim, Hae-Young Bae

Granular Computing

Dampster-Shafer Evidence Theory Based Multi-Characteristics Fusion for Clustering Evaluation

Clustering is a widely used unsupervised learning method to group data with similar characteristics. The performance of the clustering method can be in general evaluated through some validity indices. However, most validity indices are designed for the specific algorithms along with specific structure of data space. Moreover, these indices consist of a few within- and between- clustering distance functions. The applicability of these indices heavily relies on the correctness of combining these functions. In this research, we first summarize three common characteristics of any clustering evaluation: (1) the clustering outcome can be evaluated by a group of validity indices if some efficient validity indices are available, (2) the clustering outcome can be measured by an independent intra-cluster distance function and (3) the clustering outcome can be measured by the neighborhood based functions. Considering the complementary and unstable natures among the clustering evaluation, we then apply Dampster-Shafter (D-S) Evidence Theory to fuse the three characteristics to generate a new index, termed fused Multiple Characteristic Indices (fMCI). The fMCI generally is capable to evaluate clustering outcomes of arbitrary clustering methods associated with more complex structures of data space. We conduct a number of experiments to demonstrate that the fMCI is applicable to evaluate different clustering algorithms on different datasets and the fMCI can achieve more accurate and robust clustering evaluation comparing to existing indices.

Shihong Yue, Teresa Wu, Yamin Wang, Kai Zhang, Weixia Liu
Recognition of Internet Portal Users on the Basis of Their Behaviour

Our aim is to develop methodology for recognition of Internet portal users on the basis of their behaviours. This is a classification task in which we have thousands values of decision attribute and objects are described by means of sequences of symbols. We develop feature selectors which make our task tractable. Since the behaviour usually does not distinguish users, we introduce user profiles which are clusters of indiscernible users and we construct classifiers which assign descriptions of user behaviour with user profiles. We also derive specific for our task quality measures.

Wojciech Jaworski
Hierarchical Information System and Its Properties

Starting point of rough set based data analysis is a data set, called an information system, whose columns are labeled by attributes, rows are labeled by objects of interest and entries of the table are attribute values. In fact, hierarchical attribute values exists impliedly in many real-world applications, but it has seldom been taken into consideration in traditional rough set theory and its extensions. In this paper, each attribute in an information system is generalized to a concept hierarchy tree by considering hierarchical attribute values. A hierarchical information system is obtained, it is induced by generalizing a given flat table to multiple data tables with different degrees of abstraction, which can be organized as a lattice. Moreover, we can choose any level for any attribute according to the need of problem solving, thus we can discovery knowledge from different levels of abstraction. Hierarchical information system can process data from multilevel and multiview authentically.

Qinrong Feng
Feature-Weighted Mountain Method with Its Application to Color Image Segmentation

In this paper, we propose a feature-weighted mountain clustering method. The proposed method can work well when there are noisy feature variables and could be useful for obtaining initially estimated cluster centers for other clustering algorithms. Results from color image segmentation illustrate the proposed method actually produces better segmentation than previous methods.

Wen-Liang Hung, Miin-Shen Yang, Jian Yu, Chao-Ming Hwang
An Improved FCM Clustering Method for Interval Data

In fuzzy c-means (FCM) clustering algorithm, each data point belongs to a cluster with a degree specified by a membership grade. Furthermore, FCM partitions a collection of vectors in

c

fuzzy groups and finds a cluster center in each group so that the objective function is minimized. This paper introduces a clustering method for objects described by interval data. It extends the FCM clustering algorithm by using combined distances. Moreover, simulated experiments with interval data sets have been performed in order to show the usefulness of this method.

Shen-Ming Gu, Jian-Wen Zhao, Ling He
An Improved FCM Algorithm for Image Segmentation

A novel image segmentation method based on modified fuzzy c-means (FCM) is proposed in this paper. By using the neighborhood pixels as spatial information, a spatial constraint penalty term is added in the objective function of standard FCM to improve the robustness. Meanwhile, the neighbor pixels information is used selectively to reduce computing time and iterations. Experiments on real images segmentation proved the availability of the improvement.

Kunlun Li, Zheng Cao, Liping Cao, Ming Liu
A Neighborhood Density Estimation Clustering Algorithm Based on Minimum Spanning Tree

In this paper a clustering algorithm based on the minimum spanning tree (MST) with neighborhood density difference estimation is proposed. Neighborhood are defined by patterns connected with the edges in the MST of a given dataset. In terms of the difference between patterns and their neighbor density, boundary patterns and corresponding boundary edges are detected. Then boundary edges are cut, and as a result the dataset is split into defined number clusters. For reducing time complexity of detecting boundary patterns, an rough and a refined boundary candidates estimation approach are employed, respectively. The experiments are performed on synthetic and real data. The clustering results demonstrate the proposed algorithm can deal with not well separated, shape-diverse clusters.

Ting Luo, Caiming Zhong

Metaheuristic

Hybrid Differential Evolution for Global Numerical Optimization

Differential evolution (DE) is an efficient and versatile evolutionary algorithm for global numerical optimization over continuous domain. Although DE is good at exploring the search space, it is slow at the exploitation of the solutions. To alleviate this drawback, in this paper, we propose a generalized hybrid generation scheme, which attempts to enhance the exploitation and accelerate the convergence velocity of the original DE algorithm. In the hybrid generation scheme the operator with powerful exploitation is hybridized with the original DE operator. In addition, a self-adaptive exploitation factor is introduced to control the frequency of the exploitation operation. In order to evaluate the performance of our proposed generation scheme, the migration operator of biogeography-based optimization is employed as the exploitation operator. Moreover, 23 benchmark functions (including 10 test functions provided by CEC2005 special session) are chosen from the literature as the test suite. Experimental results confirm that the new hybrid generation scheme is able to enhance the exploitation of the original DE algorithm and speed up its convergence rate.

Liyuan Jia, Lei Li, Wenyin Gong, Li Huang
A Tabu-Based Memetic Approach for Examination Timetabling Problems

Constructing examination timetable for higher educational institutions is a very complex task due to the complexity of the issues involved. The objective of examination timetabling problem is to satisfy the hard constraints and minimize the violations of soft constraints. In this work, a tabu-based memetic approach has been applied and evaluated against the latest methodologies in the literature on standard benchmark problems. The approach hybridizes the concepts of tabu search and memetic algorithms. A tabu list is used to penalise neighbourhood structures that are unable to generate better solutions after the crossover and mutation operators have been applied to the selected solutions from the population pool. We demonstrate that our approach is able to enhance the quality of the solutions by carefully selecting the effective neighbourhood structures. Hence, some best known results have been obtained.

Salwani Abdullah, Hamza Turabieh, Barry McCollum, Paul McMullan
The Geometric Constraint Solving Based on the Quantum Particle Swarm

Geometric constraint problem is equivalent to the problem of solving a set of nonlinear equations substantially. The constraint problem can be transformed to an optimization problem. We can solve the problem with quantum particle swarm. The PSO is driven by the simulation of a social psychological metaphor motivated by collective behaviors of bird and other social organisms instead of the survival of the fittest individual. Inspired by the classical PSO method and quantum mechanics theories, this work presents a novel Quantum-behaved PSO (QPSO). The experiment shows that it can improve the algorithmic efficiency.

Cao Chunhong, Wang Limin, Li Wenhui
Fish Swarm Intelligent Algorithm for the Course Timetabling Problem

In this work, a simulation of fish swarm intelligence has been applied on the course timetabling problem. The proposed algorithm simulates the movements of the fish when searching for food inside a body of water (refer as a search space). The search space is classified based on the visual scope of fishes into three categories which are crowded, not crowded and empty areas. Each fish represents a solution in the solution population. The movement direction of solutions is determined based on a Nelder-Mead simplex algorithm. Two types of local search i.e. a multi decay rate great deluge (where the decay rate is intelligently controlled by the movement direction) and a steepest descent algorithm have been applied to enhance the quality of the solution. The performance of the proposed approach has been tested on a standard course timetabling problem. Computational experiments indicate that our approach produces best known results on a number of these benchmark problems.

Hamza Turabieh, Salwani Abdullah, Barry McCollum, Paul McMullan

Special Session: Cloud Model and Its Application

A Supervised and Multivariate Discretization Algorithm for Rough Sets

Rough set theory has become an important mathematical tool to deal with imprecise, incomplete and inconsistent data. As we all know, rough set theory works better on discretized or binarized data. However, most real life data sets consist of not only discrete attributes but also continuous attributes. In this paper, we propose a supervised and multivariate discretization algorithm — SMD for rough sets. SMD uses both class information and relations between attributes to determine the discretization scheme. To evaluate algorithm SMD, we ran the algorithm on real life data sets obtained from the UCI Machine Learning Repository. The experimental results show that our algorithm is effective. And the time complexity of our algorithm is relatively low, compared with the current multivariate discretization algorithms.

Feng Jiang, Zhixi Zhao, Yan Ge
Comparative Study of Type-2 Fuzzy Sets and Cloud Model

The mathematical representation of a concept with uncertainty is one of foundations of Artificial Intelligence. Type-2 fuzzy sets study fuzziness of the membership grade to a concept. Cloud model, based on probability measure space, automatically produces random membership grades of a concept through a cloud generator. The two methods both concentrate on the essentials of uncertainty and have been applied in many fields for more than ten years. However, their mathematical foundations are quite different. The detailed comparative study will discover the relationship between each other, and provide a fundamental contribution to Artificial Intelligence with uncertainty.

Kun Qin, Deyi Li, Tao Wu, Yuchao Liu, Guisheng Chen, Baohua Cao
Operations of Fuzzy Numbers via Genuine Set

In this paper, we discuss the arithmetic operations on fuzzy numbers from the point of view of Genuine set. The paper specially studies the operations on triangular fuzzy numbers and presents an effect algorithm to compute them. We divide one general triangular fuzzy number into two symmetric triangular fuzzy numbers by the properties of extension maximum and minimum operations for the convenience of the calculation.

Jun Han, Baoqing Hu
An Uncertain Control Framework of Cloud Model

The mathematical representation of a concept with uncertainty is one of the foundations of Artificial Intelligence. Uncertain Control has been the core in VSC systems and nonlinear control systems, as the representation of Uncertainty is required. Cloud Model represents the uncertainty with expectation Ex, entropy En and Hyper-entropy He by combining Fuzziness and Randomness together. Randomness and fuzziness make uncertain control be a difficult problem, hence we propose an uncertain control framework of Cloud Model called UCF-CM to solve it. UCF-CM tunes the parameters of Ex, En and He with Cloud, Cloud Controller and Cloud Adapter to generate self-adaptive control in dealing with uncertainties. Finally, an experience of a representative application with UCF-CM is implemented by controlling the growing process of artificial plants to verify the validity and feasibility.

Baohua Cao, Deyi Li, Kun Qin, Guisheng Chen, Yuchao Liu, Peng Han
A Comparative Study of Cloud Model and Extended Fuzzy Sets

Since Zadeh introduced fuzzy sets in 1965 to generalize the indicator functions of classical sets to a membership function valued in the interval [0,1], a lot of extensions of original fuzzy sets have been proposed. For example, the interval-valued fuzzy sets, intuitionistic fuzzy sets and vague sets approximate the crisp degree of membership by an interval, while the type-2 fuzzy sets generalize it by a fuzzy set in [0,1]. Other than the above, a recent extension named

cloud model

attracts many attentions, which incorporates probability measure into fuzzy sets, and permits the simultaneous consideration of randomness and fuzziness of linguistic concepts, by extending the degree of membership to random variables defined in the interval [0,1]. Basically, with the three numerical characteristics, cloud model can randomly generate a degree of membership of an element and implement the uncertain transformation between linguistic concepts and its quantitative instantiations. This paper mainly focuses on a comparative study of cloud model and other extensions of fuzzy sets, especially the theoretical significance of cloud model. And the comparative study shows that, compared with other extensions, cloud model suggests a brand new method to handle and represent the inherent uncertainty of linguistic concepts, especially fuzziness, randomness and its relationship.

Changyu Liu, Wenyan Gan, Tao Wu
A Variable Step-Size LMS Algorithm Based on Cloud Model with Application to Multiuser Interference Cancellation

This paper presents a variable step-size Least Mean Square (LMS) algorithm based on cloud model which is a new cognitive model for uncertain transformation between linguistic concepts and quantitative values. In this algorithm, we use the error differences between two adjacent iteration periods to qualitatively estimate the state of the algorithm, and translate it into a propriety step-size in number according to the linguistic description of basic principle of variable step-size LMS. Simulation results show that the proposed algorithm is able to improve the steady-state performance while keeping a better convergence rate. We also apply this new algorithm to the multiuser interference cancellation, and results are also satisfied.

Wen He, Deyi Li, Guisheng Chen, Songlin Zhang
A Qualitative Requirement and Quantitative Data Transform Model

Development of Internet technology and human computing has changed the traditional software work pattern which faces the single computer greatly. Service oriented computing becomes the new tendency and everything is over services. The appearance of mobile Internet provides a more convenient way for people to join the Internet activity. Thus, the meeting between the requirements from a number of users and services is the main problem. Cloud model, a qualitative and quantitative model can be taken to bridge the qualitative requirement on the problem domain and quantitative data services on the solution domain.

Yuchao Liu, Junsong Yin, Guisheng Chen, Songlin Zhang

Special Session: Data Mining in Cloud Computing

The High-Activity Parallel Implementation of Data Preprocessing Based on MapReduce

Data preprocessing is an important and basic technique for data mining and machine learning. Due to the dramatic increasing of information, traditional data preprocessing techniques are time-consuming and not fit for processing mass data. In order to tackle this problem, we present parallel data preprocessing techniques based on MapReduce which is a programming model to implement parallelization easily. This paper gives the implementation details of the techniques including data integration, data cleaning, data normalization and so on. The proposed parallel techniques can deal with large-scale data (up to terabytes) efficiently. Our experimental results show considerable speedup performances with an increasing number of processors.

Qing He, Qing Tan, Xudong Ma, Zhongzhi Shi
Parallel Implementation of Classification Algorithms Based on MapReduce

Data mining has attracted extensive research for several decades. As an important task of data mining, classification plays an important role in information retrieval, web searching, CRM, etc. Most of the present classification techniques are serial, which become impractical for large dataset. The computing resource is under-utilized and the executing time is not waitable. Provided the program mode of MapReduce, we propose the parallel implementation methods of several classification algorithms, such as

k

-nearest neighbors, naive bayesian model and decision tree, etc. Preparatory experiments show that the proposed parallel methods can not only process large dataset, but also can be extended to execute on a cluster, which can significantly improve the efficiency.

Qing He, Fuzhen Zhuang, Jincheng Li, Zhongzhi Shi
Research on Data Processing of RFID Middleware Based on Cloud Computing

According to the problem that research on algorithm of RFID middleware mainly focus on removing data redundancies but less on data noises, this paper builds a cube model of RFID middleware data processing, and proposes algorithms of removing data redundancies and data noises, and provides service of produces track to upper system. Cloud Computing that providing powerful computing ability to processes the cube model of data processing. Analyzing to algorithm and model shows that: the cube model describes tags’ status in reader areas well, its algorithm enable removing data redundancies and data noises, and provide service of produces track to upper system.

Zheng-Wu Yuan, Qi Li
Attribute Reduction for Massive Data Based on Rough Set Theory and MapReduce

Data processing and knowledge discovery for massive data is always a hot topic in data mining, along with the era of cloud computing is coming, data mining for massive data is becoming a highlight research topic. In this paper, attribute reduction for massive data based on rough set theory is studied. The parallel programming mode of MapReduce is introduced and combined with the attribute reduction algorithm of rough set theory, a parallel attribute reduction algorithm based on MapReduce is proposed, experiment results show that the proposed method is more efficiency for massive data mining than traditional method, and it is a effective method effective method effective method for data mining on cloud computing platform.

Yong Yang, Zhengrong Chen, Zhu Liang, Guoyin Wang

Special Session: Decision-Theoretic Rough Set Model

Analysis of Rough and Fuzzy Clustering

With the gaining popularity of rough clustering, soft computing research community is studying relationships between rough and fuzzy clustering as well as their relative advantages. Both rough and fuzzy clustering are less restrictive than conventional clustering. Fuzzy clustering memberships are more descriptive than rough clustering. In some cases, descriptive fuzzy clustering may be advantageous, while in other cases it may lead to information overload. This paper provides an experimental comparison of both the clustering techniques and describes a procedure for conversion from fuzzy membership clustering to rough clustering. However, such a conversion is not always necessary, especially if one only needs lower and upper approximations. Experiments also show that descriptive fuzzy clustering may not always (particularly for high dimensional objects) produce results that are as accurate as direct application of rough clustering. We present analysis of the results from both the techniques.

Manish Joshi, Pawan Lingras, C. Raghavendra Rao
Autonomous Knowledge-Oriented Clustering Using Decision-Theoretic Rough Set Theory

This paper processes an autonomous knowledge-oriented clustering method based on the decision-theoretic rough set theory model. In order to get the initial clustering of knowledge-oriented clusterings, the threshold values are produced autonomously in view of physics theory in this paper rather than are subjected by human intervention. Furthermore, this paper proposes a cluster validity index based on the decision-theoretic rough set theory model by considering various loss functions. Experiments with synthetic and standard data show that the novel method is not only helpful to select a termination point of the clustering algorithm, but also is useful to cluster the overlapped boundaries which is common in many data mining applications.

Hong Yu, Shuangshuang Chu, Dachun Yang
An Attribute Reduction of Rough Set Based on PSO

The basic concept of attribute reduction in Rough Sets Theory (RST) and the idea of Particle Swarm Optimization(PSO) are briefly combined. A new reduction algorithm based on PSO is developed. Furthermore, the thought of Cache is introduced into the proposed method, which reduces the algorithm complexity effectively, The experimental results demonstrate that the algorithm is simple and viable.

Hongyuan Shen, Shuren Yang, Jianxun Liu
Multiple-Category Classification with Decision-Theoretic Rough Sets

Two stages with bayesian decision procedure are proposed to solve the multiple-category classification problems. The first stage is changing an

m

-category classification problem into

m

two-category classification problems, and forming three classes of rules with different actions and decisions by using of decision-theoretic rough sets with bayesian decision procedure. The second stage is choosing the best candidate rules in positive region by using the minimum probability error criterion with bayes decision theory. By considering the levels of tolerance for errors and the costs of actions in real decision procedure, we propose a new approach to deal with the multiple-category classification problems.

Dun Liu, Tianrui Li, Pei Hu, Huaxiong Li
A Multi-agent Decision-Theoretic Rough Set Model

The decision-theoretic rough set (DTRS) model considers cost and risk factors when classifying an equivalence class into a particular region. Using DTRS, informative decisions with rough rules can be made. The current research has a focus on single agent decision makings. We propose a multi-agent DTRS model in this paper. This model seeks synthesized or consensus decision when there are multiple sets of decision preferences and criteria adopted by different agents. A set of rough decision rules that are satisfied by multiple agents can be derived from the multi-agent decision-theoretic rough set model.

Xiaoping Yang, Jingtao Yao
Naive Bayesian Rough Sets

A naive Bayesian classifier is a probabilistic classifier based on Bayesian decision theory with naive independence assumptions, which is often used for ranking or constructing a binary classifier. The theory of rough sets provides a ternary classification method by approximating a set into positive, negative and boundary regions based on an equivalence relation on the universe. In this paper, we propose a naive Bayesian decision-theoretic rough set model, or simply a naive Bayesian rough set (NBRS) model, to integrate these two classification techniques. The conditional probability is estimated based on the Bayes’ theorem and the naive probabilistic independence assumption. A discriminant function is defined as a monotonically increasing function of the conditional probability, which leads to analytical and computational simplifications.

Yiyu Yao, Bing Zhou

Special Session: Quotient Space Theory Research and Application

Protein Interface Residues Recognition Using Granular Computing Theory

Predicting of protein-protein interaction sites (PPIs) merely are researched in a single granular space in which the correlations among different levels are neglected. In this paper, PPIs models are constructed in different granular spaces based on Quotient Space Theory. We mainly use HSSP profile and PSI-Blast profile as two features for granular space, then we use granularity synthesis theory to synthesis PPIs models from different features, finally we also improve the prediction by the number of neighboring residue. With the above method, an accuracy of 59.99% with sensitivity (68.87%), CC (0.2113), F-measure (53.12%) and specificity (47.56%) is achieved after considering different level results. We then develop a post-processing scheme to improve the prediction using the relative location of the predicted residues. Best success is then achieved with sensitivity, specificity, CC, accuracy and F-measure pegged at 74.96%, 47.87%, 0.2458, 59.63% and 54.66%, respectively. Experimental results presented here demonstrate that multi-granular method can be applied to automated identification of protein interface residues.

Jiaxing Cheng, Xiuquan Du, Jiehua Cheng
Application of Quotient Space Theory in Input-Output Relationship Based Combinatorial Testing

The input-output relationship based combinatorial testing strategy considers the interactions of factors and generates test suite to cover them rather than to cover all t-way interactions. It is a very effective combinatorial testing method. However in some cases, for its characteristic of black-box testing, the identified input-output relationships may actually be inexistent or incomplete and the range of faults diagnosis is wide. To address these issues, we apply the quotient space theory in input-output combinatorial testing and extend the model of the traditional strategy to be used for gray-box testing by identifying the input-output relationships of different granules of program and constructing the test suite for each granule. In the end of the paper, we have evaluated the strategy using a case study, which indicates the effectiveness of it.

Longshu Li, Yingxia Cui, Sheng Yao
Granular Analysis in Clustering Based on the Theory of Fuzzy Tolerance Quotient Space

Clustering is defining an equivalence relation between the samples in nature, and two samples are equivalent if they belong to one class. The rough and fine of the granularity reflect the similarity threshold in clustering. In this paper, the disadvantage of granular analysis in clustering based on the theory quotient space, which can’t solve the problem when there are intersections between classes, is pointed out, and the theory of fuzzy tolerance quotient space is introduced, and granular analysis in clustering based on the theory of fuzzy tolerance quotient space is presented. The results of the experiment about clustering radio communication signals show the efficiency of the algorithm.

Lunwen Wang, Lunwu Wang, Zuguo Wu
Computing the Point-to-Point Shortest Path: Quotient Space Theory’s Application in Complex Network

The quotient space theory can represent the world at different granularity sizes and deal with complicated problems hierarchically. We present significant improvement to point-to-point shortest path based on quotient space theory in complex large-scale network. We propose the shortest path algorithm that is a heuristic method, in which evaluation function is based on community and hierarchical granularity decomposition of quotient space theory. In preprocessing, we decompose large-scale network into some communities using hierarchical granularity decomposition of quotient space theory, compute and store the minimum spanning trees in the communities and the shortest distance among communities. The implementation works on the large-scale road network. From experimental results, we know the proposed algorithm is effective and efficient in the road network of US.

Fugui He, Yanping Zhang, Shu Zhao, Ling Zhang
Fuzzy Measures and Granular Computing

From the view point of granular computing, by analyzing the structure of fuzzy measures based on the quotient space analytical method, we have the following results: (1) the necessary and sufficient condition of the isomorphism of fuzzy measure functions in fuzzy mathematics; (2) the necessary and sufficient condition of the fuzzy and granular monotony of fuzzy measures. The results open out the essence of fuzzy measures and provide a simple way to constructing fuzzy measures. The results also show that the quotient space analytical method is ubiquitous and effective in many fields.

Ling Zhang, Bo Zhang
Identifying Protein-Protein Interaction Sites Using Granularity Computing of Quotient Space Theory

The function of protein-protein interaction is very important to cell activity. Studying protein-protein interaction can help us understand life activities and pharmaceutical design. In this study, a kernel covering algorithm combined with the theory of granular computing of quotient space for predicting protein-protein interaction sites is proposed, (i.e. KCA-GS Model). This method achieves good performances, and the Sensitivity, Specificity, Accuracy and Correlation coefficient are 52.97%, 53.92%, 70.27%, 24.61%, respectively. It is indicated that our method is effective, potential and promising to identify protein-protein interaction sites.

Yanping Zhang, Yongcheng Wang, Jun Ma, Xiaoyan Chen
Moving Object Detection Based on Gaussian Mixture Model within the Quotient Space Hierarchical Theory

Based on the deficiencies of the Gaussian mixture model (GMM), the improvement is proposed in this paper. The image of video is partitioned into coarse Granularities by equivalence relation R, and the Quotient space can be obtained. Then the moving object is detected within it. The experiments show that the algorithm can improve the detection rate of the moving object without influencing to identify the object.

Yanping Zhang, Yunqiu Bai, Shu Zhao
Backmatter
Metadaten
Titel
Rough Set and Knowledge Technology
herausgegeben von
Jian Yu
Salvatore Greco
Pawan Lingras
Guoyin Wang
Andrzej Skowron
Copyright-Jahr
2010
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-16248-0
Print ISBN
978-3-642-16247-3
DOI
https://doi.org/10.1007/978-3-642-16248-0