Skip to main content
Top

2014 | Book

Rough Sets and Knowledge Technology

9th International Conference, RSKT 2014, Shanghai, China, October 24-26, 2014, Proceedings

Editors: Duoqian Miao, Witold Pedrycz, Dominik Ślȩzak, Georg Peters, Qinghua Hu, Ruizhi Wang

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book constitutes the thoroughly refereed conference proceedings of the 9th International Conference on Rough Sets and Knowledge Technology, RSKT 2014, held in Shanghai, China, in October 2014. The 70 papers presented were carefully reviewed and selected from 162 submissions. The papers in this volume cover topics such as foundations and generalizations of rough sets, attribute reduction and feature selection, applications of rough sets, intelligent systems and applications, knowledge technology, domain-oriented data-driven data mining, uncertainty in granular computing, advances in granular computing, big data to wise decisions, rough set theory, and three-way decisions, uncertainty, and granular computing.

Table of Contents

Frontmatter

Foundations and Generalizations of Rough Sets

Frontmatter
Generalized Dominance-Based Rough Set Model for the Dominance Intuitionistic Fuzzy Information Systems

A dominance-based rough set approach was proposed by replacing the indiscernibility relation with a dominance relation. The aim of this paper is to present a new extension of the dominance-based rough set by means of defining a new dominance relation, i.e., generalized dominance-based rough set model is proposed based on the dominance intuitionistic fuzzy information systems. To get the optimal decision rules from the existing dominance intuitionistic fuzzy information systems, a lower and upper approximation reduction and rule extraction algorithm are investigated. Furthermore, several properties of the generalized dominance-based rough set model are given, and the relationships between this model and the others dominance-based rough set models are also examined.

Xiaoxia Zhang, Degang Chen
On Definability and Approximations in Partial Approximation Spaces

In this paper, we discuss the relationship occurring among the basic blocks of rough set theory: approximations, definable sets and exact sets. This is done in a very general framework, named Basic Approximation Space that generalizes and encompasses previous known definitions of Approximation Spaces. In this framework, the lower and upper approximation as well as the boundary and exterior region are independent from each other. Further, definable sets do not coincide with exact sets, the former being defined “a priori” and the latter only “a posteriori” on the basis of the approximations. The consequences of this approach in the particular case of partial partitions are developed and a discussion is started in the case of partial coverings.

Davide Ciucci, Tamás Mihálydeák, Zoltán Ernő Csajbók
Many-Valued Rough Sets Based on Tied Implications

We investigate a general many-valued rough set theory, based on tied adjointness algebras, from both constructive and axiomatic approaches. The class of tied adjointness algebras constitutes a particularly rich generalization of residuated algebras and deals with implications (on two independently chosen posets (

L

, ≤ 

L

) and

$\left( P,\leq_{P}\right) $

, interpreting two, possibly different, types of uncertainty) tied by an integral commutative ordered monoid operation on

P

. We show that this model introduces a flexible extension of rough set theory and covers many fuzzy rough sets models studied in literature. We expound motivations behind the use of two lattices

L

and

P

in the definition of the approximation space, as a generalization of the usual one-lattice approach. This new setting increase the number of applications in which rough set theory can be applied.

Moataz El-Zekey
On the Topological Structure of Rough Soft Sets

The concept of rough soft set is introduced to generalize soft sets by using rough set theory, and then the soft topologies on soft sets are introduced.

Vinay Gautam, Vijay K. Yadav, Anupam K. Singh, S. P. Tiwari
Knowledge Granulation in Interval-Valued Information Systems Based on Maximal Consistent Blocks

Rough set theory, proposed by Pawlak in the early 1980s, is an extension of the classical set theory for modeling uncertainty or imprecision information. In this paper, we investigate partial relations and propose the concept of knowledge granulation based on the maximal consistent block in interval-valued information systems. The knowledge granulation can provide important approaches to measuring the discernibility of different knowledge in interval-valued information systems. These results in this paper may be helpful for understanding the essence of rough approximation and attribute reduction in interval-valued information systems.

Nan Zhang, Xiaodong Yue
Multi-granulation Rough Sets Based on Central Sets

Exploring rough sets from the viewpoint of multi-granulation has become one of the promising topics in rough set theory, in which lower or upper approximations are approximated by multiple binary relations. The purpose of this paper is to develop two new kinds of multi-granulation rough set models by using concept of central sets in a given approximation space. Firstly, the concepts of the two new models are proposed. Then some important properties and the relationship of the models are disclosed. Finally, several uncertainty measures of the models are also proposed. These results will enrich the theory and application of multi-granulation rough sets.

Caihui Liu, Meizhi Wang, Yujiang Liu, Min Wang
Rough Set Theory on Topological Spaces

We consider that Rough Sets that arise in an Information System from the point of view of Topology. The main purpose of this paper is to show how well known topological concepts are closely related to Rough Sets and generalize the Rough sets in the frame work of Topological Spaces. We presented the properties of Quasi-Discrete topology and Π

0

-Roughsets.

K. Anitha

Attribute Reduction and Feature Selection

Frontmatter
Co-training Based Attribute Reduction for Partially Labeled Data

Rough set theory is an effective supervised learning model for labeled data. However, it is often the case that practical problems involve both labeled and unlabeled data. In this paper, the problem of attribute reduction for partially labeled data is studied. A novel semi-supervised attribute reduction algorithm is proposed, based on co-training which capitalizes on the unlabeled data to improve the quality of attribute reducts from few labeled data. It gets two diverse reducts of the labeled data, employs them to train its base classifiers, then co-trains the two base classifiers iteratively. In every round, the base classifiers learn from each other on the unlabeled data and enlarge the labeled data, so better quality reducts could be computed from the enlarged labeled data and employed to construct base classifiers of higher performance. The experimental results with UCI data sets show that the proposed algorithm can improves the quality of reduct.

Wei Zhang, Duoqian Miao, Can Gao, Xiaodong Yue
Approximate Reduction for the Interval-Valued Decision Table

Many specific applications for electric power data, such as load forecasting and fault diagnosis, need to consider data changes during a period of time, rather than one record, to determine their decision classes, because the class label of only one record is meaningless. Based on the above discussion, interval-valued rough set is introduced. From the algebra view, we define the related concepts and prove the properties for the interval-valued reduction based on dependency, and present the corresponding heuristic reduction algorithm. In order to make the algorithm to achieve better results in practical applications, approximate reduction is introduced. To evaluate the proposed algorithm, we experiment on six months’ operating data of one 600MW unit in some power plant. Experimental results show that the algorithm proposed in this article can maintain a high classification accuracy with the proper parameters, and the numbers of objects and attributes can both be greatly reduced.

Feifei Xu, Zhongqin Bi, Jingsheng Lei
Global Best Artificial Bee Colony for Minimal Test Cost Attribute Reduction

The minimal test cost attribute reduction is an important component in data mining applications, and plays a key role in cost-sensitive learning. Recently, several algorithms are proposed to address this problem, and can get acceptable results in most cases. However, the effectiveness of the algorithms for large datasets are often unacceptable. In this paper, we propose a global best artificial bee colony algorithm with an improved solution search equation for minimizing the test cost of attribute reduction. The solution search equation introduces a parameter associated with the current global optimal solution to enhance the local search ability. We apply our algorithm to four UCI datasets. The result reveals that the improvement of our algorithm tends to be obvious on most datasets tested. Specifically, the algorithm is effective on large dataset Mushroom. In addition, compared to the information gain-based reduction algorithm and the ant colony optimization algorithm, the results demonstrate that our algorithm has more effectiveness, and is thus more practical.

Anjing Fan, Hong Zhao, William Zhu
Reductions of Intuitionistic Fuzzy Covering Systems Based on Discernibility Matrices

In stead of intuitionistic fuzzy covering, a new intuitionistic fuzzy binary relation is proposed using a set in an intuitionistic fuzzy covering, and correspondingly the intuitionistic fuzzy approximation space is obtained. Then a novel discernibility matrix is defined which is based on the intuitionistic fuzzy binary relation we defined. And then reductions of intuitionistic fuzzy covering information systems are studied by remaining the intuitionistic fuzzy binary relation unchanged.

Tao Feng, Jusheng Mi
Multi-label Feature Selection with Fuzzy Rough Sets

Feature selection for multi-label classification tasks has attracted attention from the machine learning domain. The current algorithms transform a multi-label learning task to several binary single-label tasks, and then compute the average score of the features across all single-label tasks. Few research discusses the effect in averaging the scores. To this end, we discuss multi-label feature selection in the framework of fuzzy rough sets. We define a novel dependency functions with three fusion methods if the fuzzy lower approximation of each label has been calculated. A forward greedy algorithm is constructed to reduce the redundancy of the selected features. Numerical experiments validate the performance of the proposed method.

Lingjun Zhang, Qinghua Hu, Jie Duan, Xiaoxue Wang
A Logarithmic Weighted Algorithm for Minimal Test Cost Attribute Reduction

Minimal test cost attribute reduction is an important problem in cost-sensitive learning since it reduces the dimensionality of the attributes space. To address this issue, many heuristic algorithms have been used by researchers, however, the effectiveness of these algorithms are often unsatisfactory on large-scale datasets. In this paper, we develop a logarithmic weighted algorithm to tackle the minimal test cost attribute reduction problem. More specifically, two major issues are addressed with regard to the logarithmic weighted algorithm. One relates to a logarithmic strategy that can suggest a way of obtaining the attribute reduction to achieve the best results at the lowest cost. The other relates to the test costs which are normalized to speed up the convergence of the algorithm. Experimental results show that our algorithm attains better cost-minimization performance than the existing a weighted information gain algorithm. Moreover, when the test cost distribution is Normal, the effectiveness of the proposed algorithm is more effective for dealing with relatively medium-sized datasets and large-scale datasets.

Junxia Niu, Hong Zhao, William Zhu
Attribute Reduction in Object Oriented Concept Lattices Based on Congruence Relations

This paper studies a new definition and an approach to attribute reduction in an object oriented concept lattice based on congruence relations. Firstly, dependence space based on the object oriented concept lattice is researched to obtain the relationship among object oriented concept lattices and the corresponding congruence relations. Then the notion of attribute reduct in this paper, resembling that in rough set theory, is defined to find minimal attribute subsets which can preserve all congruence classes determined by the attribute set. Finally, an approach of discernibility matrix is presented to calculate all attribute reducts. It is shown that attribute reducts can also keep all object oriented extents and their original hierarchy in the object oriented concept lattice.

Xia Wang, Wei-Zhi Wu
An Explicit Sparse Mapping for Nonlinear Dimensionality Reduction

A disadvantage of most nonlinear dimensionality reduction methods is that there are no explicit mappings to project high-dimensional features into low-dimensional representation space. Previously, some methods have been proposed to provide explicit mappings for nonlinear dimensionality reduction methods. Nevertheless, a disadvantage of these methods is that the learned mapping functions are combinations of all the original features, thus it is often difficult to interpret the results. In addition, the dense projection matrices of these approaches will cause a high cost of storage and computation. In this paper, a framework based on L1-norm regularization is presented to learn explicit sparse polynomial mappings for nonlinear dimensionality reduction. By using this framework and the method of locally linear embedding, we derive an explicit sparse nonlinear dimensionality reduction algorithm, which is named sparse neighborhood preserving polynomial embedding. Experimental results on real world classification and clustering problems demonstrate the effectiveness of our approach.

Ying Xia, Qiang Lu, JiangFan Feng, Hae-Young Bae

Applications of Rough Sets

Frontmatter
A Web-Based Learning Support System for Rough Sets

Web-based learning is gaining popularity due to its convenience, ubiquity, personalization, and adaptation features compared with traditional learning environments. The learning subjects of Web-based learning systems are mostly for popular sciences. Little attention has been paid for learning cutting edge subjects and no such systems have been developed for rough sets. This paper presents the design principle, system architectures, and prototype implementation of a Web-based learning support system named

Online Rough Sets

(ORS). The system is specifically designed for learning rough sets in a student-centered learning environment. Some special features, such as adaptation, are emphasized in the system. The ORS has the ability of adaptation to student preference and performance by modifying the size and order of learning materials delivered to each individual. Additionally, it predicts estimated learning time of each topic, which is helpful for students to schedule their learning paces. A demonstrative example shows ORS can support students to learn rough sets rationally and efficiently.

Ying Zhou, Jing Tao Yao
A Parallel Matrix-Based Approach for Computing Approximations in Dominance-Based Rough Sets Approach

Dominance-based Rough Sets Approach (DRSA) is a useful tool for multi-criteria classification problems solving. Parallel computing is an efficient way to accelerate problems solving. Computation of approximations is a vital step to find the solutions with rough sets methodologies. In this paper, we propose a matrix-based approach for computing approximations in DRSA and design the corresponding parallel algorithms on Graphics Processing Unit (GPU). A numerical example is employed to illustrate the feasibility of the matrix-based approach. Experimental evaluations show the performance of the parallel algorithm.

Shaoyong Li, Tianrui Li
Propositional Compilation for All Normal Parameter Reductions of a Soft Set

This paper proposes a method for compiling all the normal parameter reductions of a soft set into a conjunction of disjunctive normal form, which is generated by parameter boolean atomic formulas. A subset of parameter set is a normal parameter reduction if and only if the characteristic function of its complementary set is a model of this proposition. Three rules for simplifying this job are developed and combined.

Banghe Han, Xiaonan Li
Top-N Recommendation Based on Granular Association Rules

Recommender systems are popular in e-commerce as they provide users with items of interest. Existing top-

K

approaches mine the

K

strongest granular association rules for each user, and then recommend respective

K

types of items to her. Unfortunately, in practice, many users need only a list of

N

items that they would like. In this paper, we propose confidence-based and significance-based approaches exploiting granular association rules to improve the quality of top-

N

recommendation, especially for new users on new items. We employ the confidence measure and the significance measure respectively to select strong rules. The first approach tends to recommend popular items, while the second tends to recommend special ones to different users. We also consider granule selection, which is a core issue in granular computing. Experimental results on the well-known MovieLens dataset show that: 1) the confidence-based approach is more accurate to recommend items than the significance-based one; 2) the significance-based approach is more special to recommend items than the confidence-based one; 3) the appropriate setting of granules can help obtaining high recommending accuracy and significance.

Xu He, Fan Min, William Zhu
An Efficient BP-Neural Network Classification Model Based on Attribute Reduction

Classification is an important issue in data mining and knowledge discovery, and the attribute reduction has been proven to be effective in improving the classification accuracy in many applications. In this paper, we first apply rough set theory to reduce irrelative attribute and retain the important attributes, and the input neuron based on the important attributes can simplify the structure of BP-neuron network and improve classification accuracy. Then an efficient BP-neural network classification model based on attribute reduction is developed for high-dimensional data analysis. Finally, the experimental results demonstrate the efficiency and effectiveness of the proposed model.

Yongsheng Wang, Xuefeng Zheng
Hardware Accelerator Design Based on Rough Set Philosophy

This paper presents a design of hardware accelerator for algorithms of rough set theory. A hardware implementation of incremental reduct generation and rule induction is proposed in this paper. Incremental reduct generation algorithm is based on simplified discernibility matrix. The design has been simulated and implemented with Xilinx Artix 7 Field Programmable Gate Array (FPGA) and verified using post synthesis simulation in Xilinx .The hardware accelerator designed is generic and easily reconfigurable due to use of FPGA.The maximum design frequency achieved is 152 MHz. The proposed hardware accelerator is used for the smart grid application. The hardware accelerator extracts important features from the database of the smart grid and generates rules using them. It automates the systems, making it more reliable and less prone to human decision making. It is worth noting that the performance of the hardware accelerator becomes more visible when dealing with larger data sets.

K. S. Tiwari, A. G. Kothari, K. S. Sreenivasa Raghavan

Intelligent Systems and Applications

Frontmatter
Properties of Central Catadioptric Circle Images and Camera Calibration

Camera calibration based on circle images has unparalleled advantages in many fields. However, due to the large distortion, catadioptric camera calibration from circles remains a challenging and open problem. Under central catadioptric camera, circles in a scene are projected to quartic curves on the image plane. Except for the sufficient and necessary conditions that must be satisfied by paracatadioptric circle image, the properties of the antipodal image points and the absolute conic are both very important for catadioptric camera calibration. In this paper, we study the properties of the antipodal image points on paracatadioptric circle image, that is the criterion conditions for the antipodal image points on circle image. What’s more, we show the image of the absolute conic, and derive the constraint equations about the intrinsic parameters of central catadioptric camera. Finally, we discuss on the central catadioptric camera calibration using circle images.

Huixian Duan, Lin Mei, Jun Wang, Lei Song, Na Liu
Communication Network Anomaly Detection Based on Log File Analysis

Communication network today are becoming larger and increasingly complex. Failure in communication systems will cause loss of critical data and even economic losses. Therefore, detecting failures and diagnosing their root-cause in a timely manner is essential. Fast and accurate detection of these failures can accelerate problem determination, and thereby improve system reliability. Today log files have been paid attention on system and network failure detection, but it is still a challenging task to build an efficient model to detect anomaly from log files. To this effect, we propose a novel approach, which aims to detect frequent patterns from log files to build the normal profile, and then to identify the anomalous behaviour in log files. The experimental results demonstrate that our approach is an efficient way for anomaly detection with high accuracy and few false positives.

Xin Cheng, Ruizhi Wang
Using the Multi-instance Learning Method to Predict Protein-Protein Interactions with Domain Information

Identifying protein-protein interactions (PPIs) can help us to know the protein function and is critical for understanding the mechanisms of proteome. Recently, lots of computational methods such as the domain-based approach have been developed for predicting the protein-protein interactions. The conventional domain-based methods usually need to infer the interacting domain pairs from already known interacting sets of proteins, and then to predict the PPIs. However, it is difficult to provide the detailed information that which of the domain pairs will actually interact for the PPIs prediction. Therefore, it is of great importance to develop a new computational model which can ignore the information whether a domain pair is interacting or not. In this paper, we propose a novel method using multi-instance learning (MIL) for predicting protein-protein interactions based on the domain information. Firstly, the domain pairs of two proteins were composed. Then, we use the amino acid composition feature encoding method to encode the domain pairs. Finally, two multi-instance learning methods were used for training the data. The experiment results demonstrate that the proposed method is effective.

Yan-Ping Zhang, Yongliang Zha, Xinrui Li, Shu Zhao, Xiuquan Du
An Improved Decimation of Triangle Meshes Based on Curvature

This paper proposes an improved decimation of triangle meshes based on curvature. Mesh simplification based on vertex decimation is simple and easy for implementation. But in previous mesh simplification researches based on vertex decimation, algorithms generally focused on the distance error between the simplified mesh and the original mesh. However, a high quality simplified mesh must have low approximation error and preserve geometric features of the original model. According to this consideration, the proposed algorithm improves classical vertex decimation by calculating the mean curvature of each vertex and considering the change of curvature in local ring. Meanwhile, this algorithm wraps the local triangulation by a global triangulation. Experimental results demonstrate that our approach can preserve the major topology characteristics and geometric features of the initial models after simplifying most vertices, without complicated calculation. It also can reduce the influence from noises and staircase effects in the process of reconstruction, and result in a smooth surface.

Wei Li, Yufei Chen, Zhicheng Wang, Weidong Zhao, Lin Chen
A Community Detecting Algorithm Based on Granular Computing

Detecting the community structure of social network is really a very challenging and promising research in the world today.Granular Computing ,which can simplify the solution of problem by generating granules and implementation in different granularity spaces, is a kind of intelligent information processing model to simulate the human thinking. In this paper, a model of mining community structure based on granular computing is proposed through improving the similarity between nodes, that is, to design a corresponding mining algorithm by decomposing the problem in different granularity spaces so as to realize the structure detecting. The experimental results on three classic data sets show that the mining algorithm presented in this paper is reasonable.

Lu Liu, Taorong Qiu, Xiaoming Bai, Zhongda Lin
A Robust Online Tracking-Detection Co-training Algorithm with Applications to Vehicle Recognition

Focusing on the vehicle tracking task in a video, we propose an Online Tracking-Detection Co-Training schema that integrates detecting and tracking results in a co-training style. The tracker follows the object from frame to frame and its trajectory is used in one of feature view in co-training process. The detector recognizes the patches including given object in current frame and corrects the tracker in broken frame. Our proposed model is verified through experiments on reality videos including some challenging situations.

Chen Jiyuan, Wei Zhihua
An Approach for Automatically Reasoning Consistency of Domain-Specific Modelling Language

Domain-Specific Modeling Language (DSML) defined by informal way cannot precisely represent its structural semantics, so properties of models such as consistency cannot be systematically analyzed and verified. In response, the paper proposes an approach for automatically reasoning consistency of DSML. Firstly, we establish a formal framework for DSML based on first-order logic; and then, an automatic mapping mechanism for formalizing DSML is defined; based on this, we present our method for verifying consistency of DSML and its models based on first-order logical inference; finally, the automatic mapping engine for formalizing DSML and its models is designed to show the feasibility of our formal method.

Tao Jiang, Xin Wang, Li-Dong Huang

Knowledge Technology

Frontmatter
Online Object Tracking via Collaborative Model within the Cascaded Feedback Framework

Generative and discriminative models are commonly used in object tracking algorithms. However, the limitation of using these models lies in the fact proved by a large number of experiments that a single model is easily influenced by external factors, such as occlusion and illumination variation. To address this issue, in this paper based on a collaborative model within the cascaded feedback framework, we propose an online object tracking algorithm where an adaptive generative model has been developed which can adapt to the dynamic background. Experimentally, we show that our algorithm is able to outperform the state-of-the-art trackers on the various benchmark videos.

Sheng Tian, Zhihua Wei
Optimal Subspace Learning for Sparse Representation Based Classifier via Discriminative Principal Subspaces Alignment

Sparse representation based classifier (SRC) has been successfully applied in different pattern recognition tasks. Based on the analyses on SRC, we find that SRC is a kind of nearest subspace classifier. In this paper, a new feature extraction algorithm called discriminative principal subspaces alignment (DPSA) is developed according to the geometrical interpretations of SRC. Namely, DPSA aims to find a subspace wherein samples lie close to the hyperplanes spanned by the their homogenous samples and appear far away to the hyperplanes spanned by the their heterogenous samples. Different from the existing SRC-based feature algorithms, DPSA does not need the reconstruction coefficient vectors computed by SRC. Hence, DPSA is much more efficient than the SRC-based feature extraction algorithms. The face recognition experiments conducted on three benchmark face images databases (AR database, the extended Yale B database and CMU PIE) demonstrate the superiority of our DPSA algorithm.

Lai Wei
A Statistics-Based Semantic Relation Analysis Approach for Document Clustering

Document clustering is a widely research topic in the area of machine learning. A number of approaches have been proposed to represent and cluster documents. One of the recent trends in document clustering research is to incorporate the semantic information into document representation. In this paper, we introduce a novel technique for capturing the robust and reliable semantic information from term-term co-occurrence statistics. Firstly, we propose a novel method to evaluate the explicit semantic relation between terms from their co-occurrence information. Then the underlying semantic relation between terms is also captured by their interaction with other terms. Lastly, these two complementary semantic relations are integrated together to capture the complete semantic information from the original documents. Experimental results show that clustering performance improves significantly by enriching document representation with the semantic information.

Xin Cheng, Duoqian Miao, Lei Wang
Multi-granulation Ensemble Classification for Incomplete Data

A new learning algorithm is introduced that can deal with incomplete data. The algorithm uses a multi-granulation ensemble of classifiers approach. Firstly, the missing attributes tree (MAT) was constructed according to the missing values of samples. Secondly, the incomplete dataset was projected into a group of data subsets based on MAT, those data subsets were used as the training sets for the neural network. Based on bagging algorithm, each data subset was used to generate a group of classifiers and then using classifier ensemble to get the final prediction on each data subset. Finally, we adopt the conditional entropy as the weighting parameter to overcome the precision insufficiency of dimension based algorithm. Numerical experiments show that our learning algorithm can reduce the influence of missing attributes for classification results, and it is superior in performance to algorithm compared.

Yuan-Ting Yan, Yan-Ping Zhang, Yi-Wen Zhang
Heterogeneous Co-transfer Spectral Clustering

With the rapid growth of data collection techniques, it is very common that instances in different domains/views share the same set of categories, or one instance is represented in different domains which is called co-occurrence data. For example, the multilingual learning scenario contains documents in different languages, the images in the social media website simultaneously have text descriptions, and etc. In this paper, we address the problem of automatically clustering the instances by making use of the multi-domain information. Especially, the information comes from heterogeneous domains, i.e., the feature spaces in different domains are different. A heterogeneous co-transfer spectral clustering framework is proposed with three main steps. One is to build the relationships across different domains with the aid of co-occurrence data. The next is to construct a joint graph which contains the inter-relationship across different domains and intra-relationship within each domain. The last is to simultaneously group the instances in all domains by applying spectral clustering on the joint graph. A series of experiments on real-world data sets have shown the good performance of the proposed method by comparing with the state-of-the-art methods.

Liu Yang, Liping Jing, Jian Yu
Mixed Pooling for Convolutional Neural Networks

Convolutional Neural Network (CNN) is a biologically inspired trainable architecture that can learn invariant features for a number of applications. In general, CNNs consist of alternating convolutional layers, non-linearity layers and feature pooling layers. In this work, a novel feature pooling method, named as mixed pooling, is proposed to regularize CNNs, which replaces the deterministic pooling operations with a stochastic procedure by randomly using the conventional max pooling and average pooling methods. The advantage of the proposed mixed pooling method lies in its wonderful ability to address the over-fitting problem encountered by CNN generation. Experimental results on three benchmark image classification datasets demonstrate that the proposed mixed pooling method is superior to max pooling, average pooling and some other state-of-the-art works known in the literature.

Dingjun Yu, Hanli Wang, Peiqiu Chen, Zhihua Wei
An Approach for In-Database Scoring of R Models on DB2 for z/OS

Business Analytics is comprehensively used in many enterprises with large scale of data from databases and analytics tools like R. However, isolation between database and data analysis tool increases the complexity of business analytics, for it will cause redundant steps such as data migration and engender latent security problem. In this paper, we propose an in-database scoring mechanism, enabling application developers to consume business analytics technology. We also validate the feasibility of the mechanism using R engine and IBM DB2 for z/OS. The result evinces that in-database scoring technique can be applicable to relational databases, largely simplify the process of business analytics, and more importantly, keep data governance privacy, performance and ownership.

Yikun Xian, Jie Huang, Yefim Shuf, Gene Fuh, Zhen Gao
On the FM α -Integral of Fuzzy-Number-Valued Functions

In this paper, we define the

FM

α

integrals of fuzzy-number-valued functions and discuss its properties. Especially, we give two examples which show that the

FM

α

integrable function is not fuzzy McShane integrable, and the fuzzy Henstock integrable function is not

FM

α

integrable function. As the main outcomes, we prove that a fuzzy-number-valued function

f

:

I

0

 → 

E

n

is

FM

α

integrable on

I

0

if and only if there exists an

ACG

α

function

F

such that

F

 = 

f

almost everywhere on

I

0

.

Yabin Shao, Qiang Ma, Xiaoxia Zhang

Domain-Oriented Data-Driven Data Mining

Frontmatter
Rough Classification Based on Correlation Clustering

In this article we propose a two-step classification method. At the first step it constructs a tolerance relation from the data, and at second step it uses correlation clustering to construct the base sets, which are used at the classification of the objects. Besides the exposition of the theoretical background we also show this method in action: we present the details of the classification of the well-known iris data set. Moreover we frame some open question due this kind of classification.

László Aszalós, Tamás Mihálydeák
An Improved Hybrid ARIMA and Support Vector Machine Model for Water Quality Prediction

Traditionally, the hybrid ARIMA and support vector machine model has been often used in time series forecasting. Due to the unique variability of water quality monitoring data, the hybrid model cannot easily give perfect forecasting. Therefore, this paper proposed an improved hybrid methodology that exploits the unique strength in predicting water quality time series problems. Real data sets of water quality provided by the Ministry of Environmental Protection of People’s Republic of China during 2008-2014 were used to examine the forecasting accuracy of proposed model. The results of computational tests are very promising.

Yishuai Guo, Guoyin Wang, Xuerui Zhang, Weihui Deng
Multi-label Supervised Manifold Ranking for Multi-instance Image Retrieval

Current manifold ranking is mainly used in single-instance image retrieval without considering the prevailing semantic ambiguity problem. This paper introduces multi-instance technique and supervised information to image retrieval based on manifold ranking, and proposes a Multi-label Supervised Manifold Ranking algorithm (MSMR) for multi-instance image retrieval. The divergence between images is modified by using the multi-label information of training samples. Our method can solve partly the ’input ambiguity problem’ in the feature extraction stage and the ’output ambiguity problem’ in the output stage. Compared with the traditional Expectation Maximization Diverse Density (EMDD) and Citation-kNN algorithm on Corel Image Set, the multi-instance image retrieval experimental results show that the average precision rate of our algorithm has be enhanced .

Xianhua Zeng, Renjie Lv, Hao Lian
Nearest Neighbor Condensation Based on Fuzzy Rough Set for Classification

This work introduces a novel algorithm, called Condensation rule based on Fuzzy Rough Sets (FRSC), based on the FCNN rule together with fuzzy rough sets theory, to compute training-set-consistent subset for the nearest neighbor decision rule. In combination with fuzzy rough set theory, the FRSC rule improves the performance of FCNN rule. Two variants, named as FRSC1 and FRSC2, of the FRSC rule, are presented. The FRSC1 rule is suitable for small data set and the FRSC2 adapts to larger data sets. Compared with the FCNN rule, the FRSC1 rule requires much less time cost and gets smaller subset for small data set. For medium-size data set, less than 5000 samples, the FRSC2 rule has better time performance than FCNN rule.

Wei Pan, Kun She, Pengyuan Wei, Kai Zeng
Predicting Movies User Ratings with Imdb Attributes

In the era of Web 2.0, consumers share their ratings or comments easily with other people after watching a movie. User rating simplified the procedure which consumers express their opinions about a product, and is a great indicator to predict the box office [1-4]. This study develops user rating prediction models which used classification technique (linear combination, multiple linear regression, neural networks) to develop. Total research dataset included 32968 movies, 31506 movies were training data, and others were testing data. Three of research findings are worth summarizing: first, the prediction absolute error of three models is below 0.82, it represents the user ratings are well-predicted by the models; second, the forecast of neural networks prediction model is more accurate than others; third, some predictors profoundly affect user rating, such as writers, actors and directors. Therefore, investors and movie production companies could invest an optimal portfolio to increase ROI.

Ping-Yu Hsu, Yuan-Hong Shen, Xiang-An Xie
Feature Selection for Multi-label Learning Using Mutual Information and GA

As in the traditional single-label classification, the feature selection plays an important role in the multi-label classification. This paper presents a multi-label feature selection algorithm MLFS which consists of two steps. The first step employs the mutual information to complete the local feature selection. Based on the result of local selection, GA algorithm is adopted to select the global optimal feature subset and the correlations among the labels are considered. Compared with other multi-label feature selection algorithms, MLFS exploits the label correlation to improve the performance. The experiments on two multi-label datasets demonstrate that the proposed method has been proved to be a promising multi-label feature selection method.

Ying Yu, Yinglong Wang

Uncertainty in Granular Computing

Frontmatter
Characterizing Hierarchies on Covering-Based Multigranulation Spaces

Hierarchy plays a fundamental role in the development of the Granular Computing(GrC). In many practical applications, the granules are formed in a family of the coverings, which can construct a Covering-based Multigranulation Space(CBMS). It should be noticed that the hierarchies on Covering-based Multigranulation Spaces has become a necessity. To solve such problem, the concepts of the union knowledge distance and the intersection knowledge distance are introduced into the CBMS, which can be used to construct the knowledge distance lattices. According to the union knowledge distance and the intersection knowledge distance, two partial orderings can be derived, respectively. The example shows that the derived partial orderings can compare the finer or coarser relationships between two different Covering-based Multigranulation Spaces effectively. The theoretical results provide us a new way to the covering based granular computing.

Jingjing Song, Xibei Yang, Yong Qi, Hualong Yu, Xiaoning Song, Jingyu Yang
Uncertainty Measures in Interval-Valued Information Systems

Rough set theory is a new mathematical tool to deal with vagueness and uncertainty in artificial intelligence. Approximation accuracy, knowledge granularity and entropy theory are three main approaches to uncertainty research in classical Pawlak information system, which have been widely applied in many practical issues. Based on uncertainty measures in Pawlak information systems, we propose rough degree, knowledge discernibility and rough entropy in interval-valued information systems, and investigate some important properties of them. Finally, the relationships between knowledge granulation, knowledge discerniblity and rough degree have been also discussed.

Nan Zhang, Zehua Zhang
A New Type of Covering-Based Rough Sets

As a technique for granular computing, rough sets deal with the vagueness and granularity in information systems. Covering-based rough sets are natural extensions of the classical rough sets by relaxing the partitions to coverings and have been applied for many fields. In this paper, a new type of covering-based rough sets are proposed and the properties of this new type of covering-based rough sets are studied. First, we introduce a concept of inclusion degree into covering-based rough set theory to explore some properties of the new type of covering approximation space. Second, a new type of covering-based rough sets is established based on inclusion degree. Moreover, some properties of the new type of covering-based rough sets are studied. Finally, a simple application of the new type of covering-based rough sets to network security is given.

Bin Yang, William Zhu
A Feature Seletion Method Based on Variable Precision Tolerance Rough Sets

Feature selection is an important notions in rough sets. This paper presents a method combining tolerance relation together with rough sets. There is noise data in practical data sets. This paper investigates the feature selection method based on variable precision tolerance rough sets. The parameter was discussed and the parameter interval was described. With the change of the parameter value, the feature selection was different. The efficiency of the proposed method can be illustrated by an experiment with standard dataset from UCI database.

Na Jiao
Incremental Approaches to Computing Approximations of Sets in Dynamic Covering Approximation Spaces

In practical situations, it is of interest to investigate computing approximations of sets as an important step of attribute reduction in dynamic covering information systems. In this paper, we present incremental approaches to computing the type-1 and type-2 characteristic matrices of coverings with the variation of elements. Then we construct the second and sixth lower and upper approximations of sets by using incremental approaches from the view of matrices. We also employ examples to show how to compute approximations of sets by using the incremental and non-incremental approaches in dynamic covering approximation spaces.

Guangming Lang, Qingguo Li, Mingjie Cai, Qimei Xiao

Advances ih Granular Computing

Frontmatter
Knowledge Approximations in Multi-scale Ordered Information Systems

The key to granular computing is to make use of granules in problem solving. However, there are different granules at different levels of scale in data sets having hierarchical scale structures. And in real-world applications, there may exist multiple types of data in ordered information systems. Therefore, the concept of multi-scale ordered information systems is first introduced in this paper. The lower and upper approximations in multi-scale ordered information systems are then defined, and their properties are examined.

Shen-Ming Gu, Yi Wu, Wei-Zhi Wu, Tong-Jun Li
An Addition Strategy for Reduct Construction

This paper examines an addition strategy for constructing an attribute reduct based on three-way classification of attributes. Properties of three-way classification of attributes are used to design an algorithm for constructing a reduct by using useful attributes. The algorithm makes sure that every attribute to be added, together with already added attributes, will form a partial reduct (i.e., a subset of a reduct). Based on the results of this paper, it is possible to study a wide class of addition based reduct construction algorithms. Finally, variations of the proposed algorithm are discussed.

Cong Gao, Yiyu Yao
Analysis of User-Weighted π Rough k-Means

Since its introduction by Lingras and West a decade ago, rough k-means has gained increasing attention in academia as well as in practice. A recently introduced extension,

π

rough k-means, eliminates need for the weight parameter in rough k-means applying probabilities derived from Laplace’s Principle of Indifference. However, the proposal in its more general form makes it possible to optionally integrate user-defined weights for parameter tuning using techniques such as evolutionary computing. In this paper, we study the properties of this general user-weighted

π

k-means through extensive experiments.

Georg Peters, Pawan Lingras
An Automatic Virtual Calibration of RF-Based Indoor Positioning with Granular Analysis

The positioning methods based received signal strength indicator (RSSI) is using the RSSI values to estimate the positions of the mobile. The RSSI positioning method based on propagation models, the system’s accuracy depends on the adjustment of the propagation models parameters. In actual indoor environment, the propagation conditions are hardly predictable due to the dynamic nature of the RSSI, and consequently the parameters of the propagation model may change. In this paper, we propose and demonstrate an automatic virtual calibration technology of the propagation model that does not require human intervention; therefore, can be periodically performed, following the wireless channel conditions. We also propose the low-complexity Gaussian Filter (GF), Virtual Calibration Technology (VCT), Probabilistic Positioning Algorithm (PPA) , and Granular Analysis(GA) make the proposed algorithm robust and suitable for indoor positioning from uncertainty, self-adjective to varying indoor environment. Using MATLAB simulation, we study the calibration performance and system performance, especially the dependence on a number of system parameters, and their statistical properties. The simulation results prove that our proposed system is an accurate and cost-effective candidate for indoor positioning.

Ye Yin, Zhitao Zhang, Deying Ke, Chun Zhu
Algebraic Structure of Fuzzy Soft Sets

This paper is devoted to the discussion of algebraic structures of fuzzy soft sets. The fuzzy notion of soft equality relation on fuzzy soft sets is proposed and several related properties are investigated. Furthermore, MTL structures of fuzzy soft algebra and complex sample for affiliations and the mapping to the fuzzy soft quotient algebra are established.

Zhiyong Hong, Keyun Qin
A Rapid Granular Method for Minimization of Boolean Functions

A rapid granular method for minimization of Boolean functions is proposed in this paper. Firstly, the Boolean function is changed into the sum of products. Secondly, truth table was got and statistic information under different knowledge space was computed as heuristic information for function minimization. Thirdly, information granules with different granularity were found according to the heuristic information. Finally, if all the terms in information granules have covered the universe, they will be the desired result. The algorithm was realized by

MATLAB

and experiments have shown its high efficiency.

Zehua Chen, He Ma, Yu Zhang

Big Data to Wise Decisions

Frontmatter
QoS-Aware Cloud Service Selection Based on Uncertain User Preference

With the growing number of alternative services being deployed by cloud service providers, and users usually can only provide uncertain QoS (Quality of Service) preferences to providers, it becomes difficult to select the most suitable service to satisfy users need. In this paper, we propose a novel model of cloud service selection which considers the uncertainty of user subjective and objective weight preferences. Based on this model, we first analyses the incompleteness and fuzziness of user preference, obtains the user subjective weight preference by intuitionistic fuzzy set and objective weight by attribute significance of rough set. Then we transform the uncertain user QoS preference-aware cloud service selection to a multiple attribute decision-making problem, further we use the technique of order the preference by similarity to an ideal solution to select best service for user. Lastly, we conduct a case study about cloud storage service selection to show the effectiveness and advantages of our approach.

Bin Mu, Su Li, Shijin Yuan
Attribute Reduction in Decision-Theoretic Rough Set Model Using MapReduce

Attribute reduction is one of the most important research issues in decision-theoretic rough set model. This paper studies a new attribute measure preserving boundary region partition for a reduct. The relationships among the positive region, the probabilistic positive region and the indiscernibility object pairs for an equivalence class are analyzed. A heuristic attribute reduction algorithm framework using MapReduce in decision-theoretic rough set model is proposed. This study gives some insights into how to conduct attribute reduction in decision-theoretic rough set for big data.

Jin Qian, Ping Lv, Qingjun Guo, Xiaodong Yue
A Context-Aware Recommender System with a Cognition Inspired Model

The development of information technologies raises the problem of information overload. Recommender systems aim to choose the best application or content from numerous applications or contents. And contextual information has been taken into account to improve the recommendation accuracy. Inspired by a cognitive architecture named ACT-R, this paper combines frequency and recency into contextual information to provide context-based recommendations for mobile applications. The experimental results show the ACT-R inspired method is effective in context-based recommendations.

Liangliang Zhao, Jiajin Huang, Ning Zhong
Study on Fuzzy Comprehensive Evaluation Model of Education E-government Performance in Colleges and Universities

First, it obtains the university education e-government performance evaluation index framework by making use of Delphi method. Then, it constructs the comprehensive quality evaluation hierarchy model by applying the analytic hierarchy process, to obtain the weight for each index, based on which to establish fuzzy comprehensive evaluation model, thus acquiring new method for university education e-government performance evaluation. Examples have proven the feasibility and effectiveness of this method.

Fang Yu, Lijuan Ye, Jiaming Zhong
Parallel Attribute Reduction Based on MapReduce

With the explosive increment of data, varieties of the parallel attribute reduction algorithm have been studied. To promote its efficiency, this paper proposes a new parallel attribute reduction algorithm based on MapReduce. It contains three parts, parallel computation of a simplified decision table, parallel computation of attribute significance and parallel computation of decision table. Data with different sizes are experimented. The experimental result shows that our algorithm has the ability of processing massive data with efficiency.

Dachao Xi, Guoyin Wang, Xuerui Zhang, Fan Zhang
Dynamic Ensemble of Rough Set Reducts for Data Classification

Ensemble learning also named ensemble of multiple classifiers is one of the hot topics in machine learning. Ensemble learning can improve not only the accuracy but also the efficiency of the classification system. Constructing the component classifiers in ensemble learning is crucial, because it has direct influence on the performance of the classification system. In the construction of component classifiers, it should be guaranteed that the constructed component classifiers possess certain accuracy and diversity. Based on the confidence degree of classifier, this paper presents an approach consisting of three steps to dynamically integrate rough set reducts. Firstly, multiple reducts are computed. Secondly, multiple component classifiers with certain diversity are trained on the different reducts. Finally, these component classifiers are integrated by adopting dynamic integration strategy. The experimental results show that the proposed algorithm is efficient and feasible.

Jun-Hai Zhai, Xi-Zhao Wang, Hua-Chao Wang

Rough Set Theory

Frontmatter
Intuitionistic Fuzzy Rough Approximation Operators Determined by Intuitionistic Fuzzy Triangular Norms

In this paper, relation-based intuitionistic fuzzy rough approximation operators determined by an intuitionistic fuzzy triangular norm

T

are investigated. By employing an intuitionistic fuzzy triangular norm

T

and its dual intuitionistic fuzzy triangular conorm, lower and upper approximations of intuitionistic fuzzy sets with respect to an intuitionistic fuzzy approximation space are first introduced. Properties of

T

-intuitionistic fuzzy rough approximation operators are then examined. Relationships between special types of intuitionistic fuzzy relations and properties of

T

-intuitionistic fuzzy rough approximation operators are further explored.

Wei-Zhi Wu, Shen-Ming Gu, Tong-Jun Li, You-Hong Xu
Covering Approximations in Set-Valued Information Systems

As one of three basic theories of granular computing, rough set theory provides a useful tool for dealing with the granularity in information systems. Covering-based rough set theory is a generalization of this theory for handling covering data, which frequently appear in set-valued information systems. In this paper, we propose a covering in terms of attribute sets in a set-valued information system and study its responding three types of covering approximations. Moreover, we show that the covering approximation operators induced by indiscernible neighborhoods and neighborhoods are equal to the approximation operators induced by the tolerance and similarity relations, respectively. Meanwhile, the covering approximation operators induced by complementary neighborhoods are equal to the approximation operators induced by the inverse of the similarity relation. Finally, by introducing the concept of relational matrices, the relationships of these approximation operators are equivalently represented.

Yanqing Zhu, William Zhu
Rough Fuzzy Set Model for Set-Valued Ordered Fuzzy Decision System

The classical rough set theory can not be directly used to reduce knowledge in set-valued ordered fuzzy decision system. Firstly, we propose a dominance relation-based rough fuzzy set model in set-valued ordered fuzzy decision system, and some important properties are investigated. Then, based on rough fuzzy set, the definitions of approximation consistent set and assignment consistent set are given. Judgment theorems of approximation consistent set and assignment consistent set are also obtained, meanwhile, attribute reduction approach based on discernibility matrices is proposed to eliminate redundant attributes that are not essential from the view of fuzzy decisions. Finally, an example is given to illustrate the effectiveness of the proposed method.

Zhongkui Bao, Shanlin Yang, Ju Zhao
Optimal-Neighborhood Statistics Rough Set Approach with Multiple Attributes and Criteria

This paper focuses on the sorting problems with multiple types of attributes. About the attributes, in which are divided into qualitative attributes, quantitative attributes, qualitative criteria and quantitative criteria. Granules of knowledge are defined by applying four types of relations simultaneously: indiscernibility relation defined on qualitative attributes, similarity relation defined on quantitative attributes, dominance relation defined on qualitative criteria and quasi-partial order relation defined on quantitative criteria. To guarantee the tolerance of the system, the threshold is adjusted, resulting in a N-neighborhood system comes into being. The consistency measure which possess properties of monotonicity is regarded as the Likelihood Function, so the optimal threshold is obtained by Maximum Likelihood Estimation, as a result, N-neighborhood system is converted into optimal 1-neighborhood system. Therefore, we proposed the Optimal-Neighborhood Statistics Rough Set Approach with Multiple Attributes and Criteria.

WenBin Pei, He Lin, LingYue Li
Thresholds Determination for Probabilistic Rough Sets with Genetic Algorithms

Probabilistic rough sets define the lower and upper approximations and the corresponding three regions by using a pair of (

α

,

β

) thresholds. Many attempts have been made to determine or calculate effective (

α

,

β

) threshold values. A common principle in these approaches is to combine and utilize some intelligent technique with a repetitive process in order to optimize different properties of rough set based classification. In this article, we investigate an approach based on genetic algorithms that repeatedly modifies the thresholds while reducing the overall uncertainty of the rough set regions. A demonstrative example suggests that the proposed approach determines useful threshold values within a few iterations. It is also argued that the proposed approach provide similar results to that of some existing approaches such as the game-theoretic rough sets.

Babar Majeed, Nouman Azam, Jing Tao Yao

Three-Way Decisions, Uncertainty, and Granular Computing

Frontmatter
Three-Way Weighted Entropies and Three-Way Attribute Reduction

Rough set theory (RS-Theory) is a fundamental model of granular computing (GrC) for uncertainty information processing, and information entropy theory provides an effective approach for its uncertainty representation and attribute reduction. Thus, this paper hierarchically constructs three-way weighted entropies (i.e., the likelihood, prior, and posterior weighted entropies) by adopting a GrC strategy from the concept level to classification level, and it further explores three-way attribute reduction (i.e., the likelihood, prior, and posterior attribute reduction) by resorting to a novel approach of Bayesian inference. From two new perspectives of GrC and Bayesian inference, this study provides some new insights into the uncertainty measurement and attribute reduction of information theory-based RS-Theory.

Xianyong Zhang, Duoqian Miao
Applying Three-way Decisions to Sentiment Classification with Sentiment Uncertainty

Sentiment uncertainty is a key problem of sentiment classification. In this paper, we mainly focus on two issues with sentiment uncertainty, i.e., context-dependent sentiment classification and topic-dependent sentiment classification. This is the first work that applies three-way decisions to sentiment classification from the perspective of the decision-theoretic rough set model. We discuss the relationship between sentiment classification rules and thresholds involved in three-way decisions and then prove it. The experiment results on real data sets validate that our methods are satisfactory and can achieve better performance.

Zhifei Zhang, Ruizhi Wang
Three-Way Formal Concept Analysis

In this paper, a novel concept formation and novel concept lattices are developed with respect to a binary information table to support three-way decisions. The three-way operators and their inverse are defined and their properties are given. Based on these operators, two types of three-way concepts are defined and the corresponding three-way concept lattices are constructed. Three-way concept lattices provide a new kind of model to make three-way decisions.

Jianjun Qi, Ling Wei, Yiyu Yao
Three-Way Decision Based on Belief Function

In this paper, the basic knowledge of three-way decision and the D-S evidence theory are reviewed, respectively. A new model of the three-way decision is proposed, which is based on a belief function. The probability function is replaced with the belief function in the classical three-way decision model. Besides the decision rules are proposed in this model, some properties are also discussed. Meanwhile, a universe is divided into three disjoint regions by the different values of the belief functions in this model and their decision rules. Finally, a comprehensive illustration is presented to verify the effectiveness and feasibility of this model.

Zhan’ao Xue, Jie Liu, Tianyu Xue, Tailong Zhu, Penghan Wang
Semantically Enhanced Clustering in Retail Using Possibilistic K-Modes

Possibility theory can be used to translate numeric values into semantically more meaningful representation with the help of linguistic variables. The data mining applied to a dataset with linguistic variables can lead to results that are easily interpretable due to the inherent semantics in the representation. Moreover, the data mining algorithms based on these linguistic variables tend to orient themselves based on underlying semantics. This paper describes how to transform a real-world dataset consisting of numeric values using linguistic variables based on possibilistic variables. The transformed dataset is clustered using a recently proposed possibilistic k-modes algorithm. The resulting cluster profiles are semantically accessible with very little numerical analysis.

Asma Ammar, Zied Elouedi, Pawan Lingras
A Three-Way Decisions Clustering Algorithm for Incomplete Data

Clustering is one of the most widely used efficient approaches in data mining to find potential data structure. However, there are some reasons to cause the missing values in real data sets such as difficulties and limitations of data acquisition and random noises. Most of clustering methods can’t be used to deal with incomplete data sets for clustering analysis directly. For this reason, this paper proposes a three-way decisions clustering algorithm for incomplete data based on attribute significance and miss rate. Three-way decisions with interval sets naturally partition a cluster into positive region, boundary region and negative region, which has the advantage of dealing with soft clustering. First, the data set is divided into four parts such as sufficient data, valuable data, inadequate data and invalid data, according to the domain knowledge about the attribute significance and miss rate. Second, different strategies are devised to handle the four types based on three-way decisions. The experimental results on some data sets show preliminarily the effectiveness of the proposed algorithm.

Hong Yu, Ting Su, Xianhua Zeng
Sentiment Analysis with Automatically Constructed Lexicon and Three-Way Decision

An unsupervised sentiment analysis method is presented to classify user comments on laptops into positive ones and negative ones. The method automatically extracts informative features in testing dataset and labels the sentiment polarity of each feature to make a domain-specific lexicon. The classification accuracy of this lexicon will be compared to that with an existing general sentiment lexicon. Besides, the concept of three-way decision will be applied in the classifier as well, which combines lexicon-based methods and supervised learning methods together. Results indicate that the overall performance can reach considerable improvements with three-way decision.

Zhe Zhou, Weibin Zhao, Lin Shang
A Novel Intelligent Multi-attribute Three-Way Group Sorting Method Based on Dempster-Shafer Theory

Multi-attribute group sorting (MAGS) has become a popular subject in multi-attribute decision making fields. The optimization preference disaggregation method and the outranking relation method are frequently used to solve this kind of problems. However, when faced with a MAGS with more attributes and alternatives, these methods show their limitations such as the intensive computations and the difficulty to determine the necessary parameters. To overcome these limitations, we here propose an intelligent three-way group sorting method based on Dempster-Shafer theory for obtaining a more credible sorting result. In the proposed method, decision evidences are constructed by computing the fuzzy memberships of an alternative belonging to the decision classes; the famous Dempster combination approach is further used to aggregate these evidences for making the final group sorting. In the end, a simulation example is employed to show the effectiveness of the new method.

Baoli Wang, Jiye Liang
Dynamic Maintenance of Three-Way Decision Rules

Decision-theoretic rough sets provide a three-way decision framework for approximating a target concept, with an error-tolerance capability to handle uncertainty problems by using a pair of thresholds on probability. The three-way decision rules of acceptance, rejection and deferment decisions can be derived directly from the three regions implied by rough set approximations. The decision environment is prone to dynamic instead of static in reality. With the data changed continuously, the three regions of a target decision will be changed inevitably, while the induced three-way decision rules will be changed avoidably. In this paper, we discuss the dynamic maintenance principles of three-way decision rules based on the variation of three regions with an incremental object. Decision rules can be updated incrementally without re-computing rule sets from the very beginning when a new object is added up to an information system.

Chuan Luo, Tianrui Li, Hongmei Chen
An Overview of Function Based Three-Way Decisions

By considering the various of studies on loss functions with three-way decisions, a function based three-way decisions is proposed to generalize the existing models. A “four-level” approach with granular perspective is built, and the existing models can be categorized to a “four-level” framework through different decision criteria. Our work provides a novel “granularity” viewpoint on the current three-way decision researches.

Dun Liu, Decui Liang
Multicost Decision-Theoretic Rough Sets Based on Maximal Consistent Blocks

Decision-theoretic rough set comes from Bayesian decision procedure, in which a pair of the thresholds is derived by the cost matrix for the construction of probabilistic rough set. However, classical decision-theoretic rough set can only be used to deal with complete information systems. Moreover, it does not take the property of variation of cost into consideration. To solve above two problems, the maximal consistent block is introduced into the construction of decision-theoretic rough set by using multiple cost matrixes. Our approach includes optimistic and pessimistic multicost decision-theoretic rough set models. Furthermore, the whole decision costs of optimistic and pessimistic multicost decision-theoretic rough sets are calculated in decision systems. This study suggests potential application areas and new research trends concerning decision-theoretic rough set.

Xingbin Ma, Xibei Yang, Yong Qi, Xiaoning Song, Jingyu Yang
A Method to Reduce Boundary Regions in Three-Way Decision Theory

A method for dealing the boundary region in three-way decision theory is proposed. In the three-way decision theory, all the elements are divided into three regions: positive region, negative region and boundary region. Positive region makes a decision of acceptance, negative region makes a decision of rejection. They can generate certain rules. However, boundary region makes a decision of abstaining. They generate uncertain rule. In classification, we always do with the boundary region. In this paper, we propose a method based on tri-training algorithm to reduce the boundary region. In the tri-training algorithm, we build up three classifiers based on three-way decision. We divide all the data into three parts randomly, aiming to keep the three classifiers different. We adopt a voting mechanism to label test samples. Experiments have shown that in most cases, tri-training algorithm is not only benefit for reducing boundary regions but also for improving classification precision. We also find some rules about the parameters alpha and beta how to affect boundary regions and classification precision.

Ping Li, Lin Shang, Huaxiong Li
An Integrated Method for Micro-blog Subjective Sentence Identification Based on Three-Way Decisions and Naive Bayes

Microblog’s subjective sentence recognition is the basis of it’s public opinion analysis further research .Therefore, its recognition accuracy is crucial for future research work. Owing to the imprecision or incomplete of information, the precision of traditional SVM, NB and other machine learning algorithms that for microblog’s subjective sentence recognition is not ideal. Presents a method based on the integrated of three-way decision and Bayesian algorithms to distinguish microblog’s subjective sentence. Compared with traditional Bayesian algorithms, Experimental results show that the proposed integrated approach can significantly improve the accuracy of subjective sentence’s recognition.

Yanhui Zhu, Hailong Tian, Jin Ma, Jing Liu, Tao Liang
Probabilistic Rough Set Model Based on Dominance Relation

Unlike Pawlak rough set, probabilistic rough set models allow a tolerance inaccuracy in lower and upper approximations. Dominance relation cannot establish probability measure space for the universe. In this paper, the basic set assignment function, namely partition function is introduced into our work, which can transform the non-probability measure generated by dominance relation into a probability measure space. The probabilistic rough set model is established based on dominance relation, and explained clearly through an example.

Wentao Li, Weihua Xu
Backmatter
Metadata
Title
Rough Sets and Knowledge Technology
Editors
Duoqian Miao
Witold Pedrycz
Dominik Ślȩzak
Georg Peters
Qinghua Hu
Ruizhi Wang
Copyright Year
2014
Electronic ISBN
978-3-319-11740-9
Print ISBN
978-3-319-11739-3
DOI
https://doi.org/10.1007/978-3-319-11740-9

Premium Partner