Keynotes, Tutorials and Expert Papers

Frontmatter

Advances in Rough and Soft Clustering: Meta-Clustering, Dynamic Clustering, Data-Stream Clustering

Over the last five decades, clustering has established itself as a primary unsupervised learning technique. In most major data mining projects clustering can serve as a first step in understanding the available data. Clustering is used for creating meaningful profiles of entities in an application. It can also be used to compress the dataset into more manageable granules. The initial methods of crisp clustering objects represented using numeric attributes have evolved to address the demands of the real-world. These extensions include the use of soft computing techniques such as fuzzy and rough set theory, the use of centroids and medoids for computational efficiency, modes to accommodate categorical attributes, dynamic and stream clustering for managing continuous accumulation of data, and meta-clustering for correlating parallel clustering processes. This paper uses applications in engineering, web usage, retail, finance, and social networks to illustrate some of the recent advances in clustering and their role in improved profiling, as well as augmenting prediction, classification, association mining, dimensionality reduction, and optimization tasks.

Pawan Lingras, Matt Triff

Clinical Narrative Analytics Challenges

Precision medicine or evidence based medicine is based on the extraction of knowledge from medical records to provide individuals with the appropriate treatment in the appropriate moment according to the patient features. Despite the efforts of using clinical narratives for clinical decision support, many challenges have to be faced still today such as multilinguarity, diversity of terms and formats in different services, acronyms, negation, to name but a few. The same problems exist when one wants to analyze narratives in literature whose analysis would provide physicians and researchers with highlights. In this talk we will analyze challenges, solutions and open problems and will analyze several frameworks and tools that are able to perform NLP over free text to extract medical entities by means of Named Entity Recognition process. We will also analyze a framework we have developed to extract and validate medical terms. In particular we present two uses cases: (i) medical entities extraction of a set of infectious diseases description texts provided by MedlinePlus and (ii) scales of stroke identification in clinical narratives written in Spanish.

Ernestina Menasalvas, Alejandro Rodriguez-Gonzalez, Roberto Costumero, Hector Ambit, Consuelo Gonzalo

Modern ICT and Mechatronic Systems in Contemporary Mining Industry

The paper deals with modern ICT techniques and systems, and mechatronic systems for mining industry, with particular attention paid to results achieved by the authors and their research groups. IT systems concern process and machinery monitoring, fault detection and isolation of processes and machinery, and assessment of risk and hazards in mining industry. Furthermore, innovative applications of AI methods are addressed, including pattern recognition and interpretation for process control, classification of seismic events, estimating loads of conveyors, and the others. Special attention is paid to applications of mechatronic solutions, such as: unmanned working machinery and longwalls in coal mines, and specialised robots for basic work. Mobile robots for inspecting areas of mines affected by catastrophes are presented, too. Moreover, recent communication solutions for collision avoidance, localisation of mining machinery, and wireless transmission are addressed. The paper concludes with most likely development of ICT and mechatronic systems for mining industry.

Wojciech Moczulski, Piotr Przystałka, Marek Sikora, Radosław Zimroz

Rough Sets of Zdzisław Pawlak Give New Life to Old Concepts. A Personal View ....

Zdzisław Pawlak influenced our thinking about uncertainty by borrowing the idea of approximation from geometry and topology and carrying those ideas into the realm of knowledge engineering. In this way, simple and already much worn out mathematical notions, gained a new life given to them by new notions of decision rules and algorithms, complexity problems, and problems of optimization of relations and rules. In his work, the author would like to present his personal remembrances of how his work was influenced by Zdzisław Pawlak interlaced with discussions of highlights of research done in enliving classical concepts in new frameworks, and next, he will go to more recent results that stem from those foundations, mostly on applications of rough mereology in behavioral robotics and classifier synthesis via granular computing.

Lech T. Polkowski

Rough Set Approaches to Imprecise Modeling

In the classical rough set approaches, lower approximations of single decision classes have been mainly treated. Based on those approximations, attribute reduction and rule induction have been developed. In this paper, from the authors’ recent studies, we demonstrate that various analyses are conceivable by treating lower approximations of unions of multiple decision classes.

Masahiro Inuiguchi

Rule Set Complexity for Incomplete Data Sets with Many Attribute-Concept Values and “Do Not Care” Conditions

In this paper we present results of novel experiments conducted on 12 data sets with many missing attribute values interpreted as attribute-concept values and “do not care” conditions. In our experiments complexity of rule sets, in terms of the number of rules and the total number of conditions induced from such data, are evaluated. The simpler rule sets are considered better. Our first objective was to check which interpretation of missing attribute values should be used to induce simpler rule sets. There is some evidence that the “do not care” conditions are better. Our secondary objective was to test which of the three probabilistic approximations: singleton, subset or concept, used for rule induction should be used to induce simpler rule sets. The best choice is the subset probabilistic approximation and the singleton probabilistic approximation is the worst choice.

Patrick G. Clark, Cheng Gao, Jerzy W. Grzymala-Busse

Rough Sets, Approximation and Granulation

Frontmatter

Rough Sets Are -Accessible

This paper focuses on the relationship between perceptions and sets considering that perceptions are not only imprecise or doubtful, but they are also multiple. Accessible sets are developed according to this view, where sets representation is a central problem depending not only on features of its objects, but also on their perceptions. The accessibility notion is related to the perception and can be summarized as follows “to be accessible is to be perceived”, which is more weak than the Berkeley’s idealism. In this context, we revisit Rough sets showing that: (1) the Pawlak’s perception of sets can be written using only two perceivers, which are respectively pessimistic and optimistic, and (2) Rough sets are $$\varepsilon $$-accessible. Moreover, we introduce a rough set computational theory of perception, denoted $$\pi $$-RST and discuss the perception dynamic problem laying its foundation on social interaction between perceivers, granularity and preference.

Mohamed Quafafou

Refinements of Orthopairs and IUML-algebras

In this paper we consider sequences of orthopairs given by refinement sequences of partitions of a finite universe. While operations among orthopairs can be fruitfully interpreted by connectives of three-valued logics, we describe operations among sequences of orthopairs by means of the logic IUML of idempotent uninorms having an involutive negation.

Stefano Aguzzoli, Stefania Boffa, Davide Ciucci, Brunella Gerla

Rough Approximations Induced by Orthocomplementations in Formal Contexts

Formal contexts is a common framework for rough set theory and formal concept analysis, and some rough set models in formal contexts have been proposed. In this paper, based on the theory of abstract approximation spaces presented by Cattaneo [1], a Brouwer orthocomplementation on the set of objects of a formal context is presented, as a result, a pair of new lower and upper rough approximation operators is introduced. Comparison between the new approximation operators and the existing approximation operators is made, and two necessary and sufficient conditions about equivalence of the operators are obtained. Relationships and algebraic structures among the definable subsets of these approximation operators are investigated.

Tong-Jun Li, Wei-Zhi Wu, Shen-Ming Gu

On Optimal Approximations of Arbitrary Relations by Partial Orders

The problem of optimal quantitative approximation of an arbitrary binary relation by a partial order is discussed and some solution is provided. It is shown that even for a very simple quantitative measure the problem is NP-hard. Some quantitative metrics are also applied for known property-driven approximations by partial orders.

Ryszard Janicki

On Approximation of Relations by Generalized Closures and Generalized Kernels

Various concepts of closures and kernels are introduced and discussed in the context of approximation of arbitrary relations by relations with specific properties.

Agnieszka D. Bogobowicz, Ryszard Janicki

Ordered Information Systems and Graph Granulation

The concept of an Information System, as used in Rough Set theory, is extended to the case of a partially ordered universe equipped with a set of order preserving attributes. These information systems give rise to partitions of the universe where the set of equivalence classes is partially ordered. Such ordered partitions correspond to relations on the universe which are reflexive and transitive. This correspondence allows the definition of approximation operators for an ordered information system by using the concepts of opening and closing from mathematical morphology. A special case of partial orders are graphs and hypergraphs and these provide motivation for the need to consider approximations on partial orders.

John G. Stell

Rough Sets, Non-Determinism and Incompleteness

Frontmatter

A Rough Perspective on Information in Extensive Form Games

In game theory imperfect and incomplete information have been intensively addressed. In extensive form games a player faces imperfect information when it cannot identify the decision node it is presently located at. The player is only aware of an information set consisting of more than one node. A player faces incomplete information when it is not aware of, e.g., preferences or payoffs of its opponents. Rough set theory is a prime method addressing missing and contradicting information in decision tables where a set of variables induces a decision. In particular, rough set theory provides a means by which records with identical variable values lead to different, contradicting decisions. To indicate such situations, these records are assigned to the boundaries of all possible decisions. Obviously, both situations, games with imperfect or incomplete information and rough decision tables are similar with respect to their characteristics and challenges regarding a lack of information. Hence, a discussion of their relationship could be mutually beneficial. Therefore, the objective of our paper is to provide a rough set perspective on extensive form games with imperfect and incomplete information.

Georg Peters

Towards Coordination Game Formulation in Game-Theoretic Rough Sets

Game-theoretic rough set model (GTRS) is a recent advancement in determining three rough set regions by formulating games between multiple measures. GTRS has been focusing on researching competitive games in which the involved game players have opposite interests or incentives. There are different types of games that can be adopted in GTRS, such as coordination games, cooperation games, as well as competition games. Coordination games are a class of games in which the involved players have harmonious interests and enforce coordinative behaviors to achieve an efficient outcome. In this paper, we formulate coordination games between measures to determine rough set regions. Especially, we analyze the measures for evaluating equivalence classes. We determine rough regions to which every equivalence class should belong to by formulating coordination games and finding equilibrium for each equivalence class. The motivation and process of formulating coordination games are discussed in detail, and a demonstrative example shows the feasibility of the proposed approach.

Yan Zhang, Jing Tao Yao

Probabilistic Estimation for Generalized Rough Modus Ponens and Rough Modus Tollens

We review concepts and principles of Modus Ponens and Modus Tollens in the areas of rough set theory and probabilistic inference. Based on the upper and the lower approximation of a set as well as the existing probabilistic results, we establish a generalized version of rough Modus Ponens and rough Modus Tollens with a new fact different from the premise (or the conclusion) of “if ...then ...” rule, and address the problem of computing the conditional probability of the conclusion given the new fact (or of the premise given the new fact) from the probability of the new fact and the certainty factor of the rule. The solutions come down to the corresponding interval for the conditional probabilities, which are more appropriate than the exact values in the environment full of uncertainty due to errors and inconsistency existed in measurement, judgement, management, etc., plus illustration analysis.

Ning Yao, Duoqian Miao, Zhifei Zhang, Guangming Lang

Definability in Incomplete Information Tables

This paper investigates the issues related to definability in an incomplete information table by using interval sets. We review the existing results pertaining to definability in a complete information table. We generalize the satisfiability of formulas in a description language in a complete table to a pair of strong and weak satisfiability of formulas in an incomplete table, which leads to an interval-set based interpretation of formulas. While we have definable sets in a complete table, we have definable interval sets in an incomplete table. The results are useful for studying concept analysis and approximations with incomplete tables.

Mengjun Hu, Yiyu Yao

Rough Sets by Indiscernibility Relations in Data Sets Containing Possibilistic Information

Under data sets containing possibilistic information, rough sets are described by directly using indiscernibility relations. First, we give rough sets based on indiscernibility relations under complete information. Second, we address rough sets by applying possible world semantics to data sets with possibilistic information. The rough sets are used as a correctness criterion of approaches extended to deal with possibilistic information. Third, we extend the approach based on indiscernibility relations to handle data sets with possibilistic information. Rough sets in this extension creates the same results as ones obtained under possible world semantics. This gives justification to our extension.

Michinori Nakata, Hiroshi Sakai

Matrix-Based Rough Set Approach for Dynamic Probabilistic Set-Valued Information Systems

Set-valued information systems (SvIS), in which the attribute values are set-valued, are important types of data representation with uncertain and missing information. However, all previous investigations in rough set community do not consider the attribute values with probability distribution in SvIS, which may be impractical in many real applications. This paper introduces probabilistic set-valued information systems (PSvIS) and presents an extended variable precision rough sets (VPRS) approach based on $$\lambda $$-tolerance relation for PSvIS. Furthermore, due to the dynamic variation of attributes in PSvIS, viz., the addition and deletion of attributes, we present a matrix characterization of the proposed VPRS model and discuss some related properties. Then incremental approaches for maintaining rough approximations based on matrix operations are presented, which can effectively accelerate the updating of rough approximations in dynamic PSvIS.

Yanyong Huang, Tianrui Li, Chuan Luo, Shi-jinn Horng

Rough Sets and Three-way Decisions

Frontmatter

Variance Based Determination of Three-Way Decisions Using Probabilistic Rough Sets

The probabilistic rough sets is an important generalization of rough sets where a pair of thresholds is used to form new rough regions. The pair of thresholds controls different quality related criteria such as classification accuracy, precision, uncertainty, costs and risks of rough sets based three-way decision making. In this article, we introduce variance based criteria for determining the thresholds including within region variance, between region variance and ratio of the two variances. In particular, we examine the variance or spread in conditional probabilities of equivalence classes contained in different probabilistic regions. We also show that the determination of thresholds may be considered based on optimization of the proposed criteria.

Nouman Azam, Jing Tao Yao

Utilizing DTRS for Imbalanced Text Classification

Imbalanced data classification is one of the challenging problems in data mining and machine learning research. The traditional classification algorithms are often biased towards the majority class when learning from imbalanced data. Much work have been proposed to address this problem, including data re-sampling, algorithm modification, and cost-sensitive learning. However, most of them focus on one of these techniques. This paper proposes to utilize both algorithm modification and cost-sensitive learning based on decision-theoretic rough set (DTRS) model. In particular, we use naive Bayes classifier as the base classifier and modify it for imbalanced learning. For cost-sensitive learning, we adopt the systematic method from DTRS to derive required thresholds that have the minimum decision cost. Our experimental results on three well-known text classification databases show that unified DTRS provides similar performance on balanced class distribution, outperforms naive Bayes classifier on imbalanced datasets, and is competitive with other imbalanced learning classifier.

Bing Zhou, Yiyu Yao, Qingzhong Liu

A Three-Way Decision Clustering Approach for High Dimensional Data

In this paper, we propose a three-way decision clustering approach for high-dimensional data. First, we propose a three-way K-medoids clustering algorithm, which produces clusters represented by three regions. Objects in the positive region of a cluster certainly belong to the cluster, objects in the negative region of a cluster definitively do not belong to the cluster, and objects in the boundary region of a cluster may belong to multiple clusters. Then, we propose the novel three-way decision clustering approach using random projection method. The basic idea is to apply the three-way K-medoids several times, increasing the dimensionality of the data after each iteration of three-way K-medoids. Because the center of the project result is used to be the initial center of the next projection, the time of computing is greatly reduced. Experimental results show that the proposed clustering algorithm is suitable for high-dimensional data and has a higher accuracy and does not sacrifice the computing time.

Hong Yu, Haibo Zhang

Three-Way Decisions Based Multi-label Learning Algorithm with Label Dependency

A great number of algorithms have been proposed for multi-label learning, and these algorithms usually divide the labels with an optimal threshold according to their relevances to an unseen instance. However, it may easily cause misclassification to directly determine whether an unseen instance has the label with relevance close to the threshold. The label with relevance close to the threshold has a high uncertainty. Three-way decisions theory is an efficient method to solve the uncertainty problem. Therefore, based on three-way decisions theory, a multi-label learning algorithm with label dependency is proposed in this paper. Label dependency is an inherent property in multi-label data. The labels with high uncertainty are further handled with a label dependency model, which is represented by the logistic regression in this paper. The experimental results show that this algorithm performs better.

Feng Li, Duoqian Miao, Wei Zhang

A Decision-Theoretic Rough Set Approach to Multi-class Cost-Sensitive Classification

As a kind of probabilistic rough set model, decision-theoretic rough set is usually used to deal with binary classification problems. This paper provides a new formulation of multi-class decision-theoretic rough set by combining decision-theoretic rough set model with classical cost-sensitive learning. Upper approximation, lower approximation, positive region, negative region and boundary region can be derived from the $$n\,\times \,n$$ cost matrix of classical multi-class situation. The probability thresholds for three-way decisions making are defined. A cost-sensitive classification algorithm based on multi-class decision-theoretic rough set model is presented. The experimental results on several UCI data sets indicate that the proposed algorithm can get a better performance on classification accuracy and total cost.

Guojian Deng, Xiuyi Jia

Research on Cost-Sensitive Method for Boundary Region in Three-Way Decision Model

The three-way decision theory (3WD) is constructed based on the notions of the acceptance, rejection or non-commitment, which can be directly generated by the three regions: positive region (POS), negative region (NEG) and boundary region (BND). At present, how to process the boundary region has become a hot topic in the field of three-way decision theory. Although several methods have been proposed to address this problem, most of them don’t take cost-sensitive classification into consideration. In this paper, we adopt a cost-sensitive method to deal with the boundary region. Under the principle of reducing loss of classification, we adjust the border distance which is between sample of boundary region and the cover through introducing a cost-sensitive distance coefficient $$\eta $$. The coefficient $$\eta $$ can be automatically calculated according to the distribution characteristics of samples. Compared with other models, experimental results show that our model can obtain high correct classification rate. What’s more, our model can reduce loss of classification by improving the recall rate of high cost sample when dealing with the boundary region.

Yanping Zhang, Gang Wang, Jie Chen, Liandi Fang, Shu Zhao, Ling Zhang, Xiangyang Wang

Determining Thresholds in Three-Way Decisions with Chi-Square Statistic

In an evaluation function based three-way decisions model, a pair of thresholds divides a universal set into three regions called a trisection or tri-partition of the universe: a region consists of objects whose values are at or above one threshold, a region of objects whose values are at or below the other threshold, and a region of objects whose values are between the two thresholds. An optimization based method for determining the pair of thresholds is to minimize or maximize an objective function that quantifies the quality, cost, or benefit of a trisection. In this paper, we use the chi-square statistic to interpret and establish an objective function in the context of classification. The maximization of the chi-square statistic searches for a strong correlation between the trisection and the classification.

Cong Gao, Yiyu Yao

Optimistic Decision-Theoretic Rough Sets in Multi-covering Space

This paper discusses optimistic multigranulation decision-theoretic rough sets in multi-covering space. First, by using the strategy “seeking commonality while preserving difference”, we propose the notion of optimistic multigranulation decision-theoretic rough sets on the basis of Bayesian decision procedure. Then, we investigate some important properties of the model. Finally, we investigate the relationships between the proposed model and other related rough set models.

Caihui Liu, Meizhi Wang

Fuzziness and Similarity in Knowledge Representation

Frontmatter

Interpretations of Lower Approximations in Inclusion Degrees

The nature of uncertainty inference is to give evaluations on inclusion relationships by means of various measures. In this paper we introduce the concept of inclusion degrees into rough set theory. It is shown that the lower approximations of the rough set theory in both the crisp and the fuzzy environments can be represented as inclusion degrees.

Wei-Zhi Wu, Chao-Jun Chen, Xia Wang

Multigranulation Rough Sets in Hesitant Fuzzy Linguistic Information Systems

Based on lower and upper approximations induced by multiple binary relations, multigranulation rough set theory has become one of the most promising research topics in the domain of rough set theory. Through combining multigranulation rough sets with hesitant fuzzy linguistic term sets, this article introduces a hybrid model of multigranulation rough sets, named a hesitant fuzzy linguistic (HFL) multigranulation rough set. In the framework of granular computing, we first give basic definitions of optimistic and pessimistic hesitant fuzzy linguistic multigranulation rough sets. Then, we explore some important properties about hesitant fuzzy linguistic multigranulation rough sets. Lastly, uncertainty measures for the hesitant fuzzy linguistic multigranulation approximation space are addressed.

Chao Zhang, De-Yu Li, Yan-Hui Zhai

Multi-granularity Similarity Measure of Cloud Concept

Cloud model achieves bidirectional transformation between qualitative concepts and quantitative values using the forward and backward cloud transformation algorithms. In a cognition process, the similarity measure of cloud concepts is a crucial issue. Traditional similarity measures of cloud concept based on single granularity fail to measure the similarity of multi-granularity concepts. Based on a combination of Earth Movers Distance (EMD) and Kullback-Leibler Divergence (KLD), a multi-granularity similarity measure - EMDCM based on Adaptive Gaussian Cloud Transformation (AGCT) is proposed. Wherein, AGCT realizes multiple granularity concept generation and uncertain extraction between cloud models automatically. EMD is used to measure the similarity between different concepts. Experiments have been done to evaluate this method and the results show its performance and validity.

Jie Yang, Guoyin Wang, Xukun Li

Multi-adjoint Concept Lattices, Preferences and Bousi Prolog

The use of preferences is usual in the natural language and it must be taken into account in the diverse theoretical frameworks focused on the knowledge management in databases. This paper exploits the possibility of considering preferences in a (discrete) fuzzy concept lattice framework.

M. Eugenia Cornejo, Jesús Medina, Eloísa Ramírez-Poussa, Clemente Rubio-Manzano

Modified Generalized Weighted Fuzzy Petri Net in Intuitionistic Fuzzy Environment

In this paper, a modification for the generalized weighted fuzzy Petri net in intuitionistic fuzzy environment has been proposed with the help of inverted fuzzy implication as an output operator in ope-rator binding function. It provides a way to optimize the truth values at the output places. Approximate reasoning algorithms for such Petri net have been proposed. A numerical example is provided to logically establish the proposed theory.

Sibasis Bandyopadhyay, Zbigniew Suraj, Piotr Grochowalski

Machine Learning and Decision Making

Frontmatter

Similarity-Based Classification with Dominance-Based Decision Rules

We consider a similarity-based classification problem where a new case (object) is classified based on its similarity to some previously classified cases. In this process of case-based reasoning (CBR), we adopt the Dominance-based Rough Set Approach (DRSA), that is able to handle monotonic relationship “the more similar is object y to object x with respect to the considered features, the closer is y to x in terms of the membership to a given decision class X”. At the level of marginal similarity concerning single features, we consider this similarity in ordinal terms only. The marginal similarities are aggregated within induced decision rules describing monotonic relationship between comprehensive similarity of objects and their similarities with respect to single features.

Marcin Szeląg, Salvatore Greco, Roman Słowiński

Representative-Based Active Learning with Max-Min Distance

Active learning has been a hot topic because labeled data are useful, however expensive. Many existing approaches are based on decision trees, Naïve Bayes algorithms, etc. In this paper, we propose a representative-based active learning algorithm with max-min distance. Our algorithm has two techniques interacting with each other. One is the representative-based classification inspired by covering-based neighborhood rough sets. The other is critical instance selection with max-min distance. Experimental results on six UCI datasets indicate that, with the same number of labeled instances, our algorithm is comparable with or better than the ID3, C4.5 and Naïve Bayes algorithms.

Fu-Lun Liu, Fan Min, Liu-Ying Wen, Hong-Jie Wang

Fuzzy Multi-label Classification of Customer Complaint Logs Under Noisy Environment

Analyzing and understanding customer complaints has become and important issue in almost all enterprises. With respect to this, one of the key factors involve is to automatically identify and analyze the different causes of the complaints. A single complaint may belong to multiple complaint domains with fuzzy associations to each of the different domains. Thus, single label or multi-class classification techniques may not be suitable for classification of such complaint logs. In this paper, we have analyzed and classified customer complaints of some of the leading telecom service providers in India. Accordingly, we have adopted a fuzzy multi-label text classification approach along with different language independent statistical features to address the above mentioned issue. Our evaluation shows combining the features of point-wise mutual information and unigram returns the best possible result.

Tirthankar Dasgupta, Lipika Dey, Ishan Verma

Some Weighted Ranking Operators with Interval Valued Intuitionistic Fuzzy Information Applied to Outsourced Software Project Risk Assessment

Some weighted ranking operators of interval valued intuitionistic fuzzy sets (IVIFS) are presented in this paper. By analyzing the interval of membership degree, the interval of non-membership degree and the interval of hesitant degree, we provide two types of weighted ranking operators with IVIFS information. And we prove some mathematical properties of these ranking operators. Finally, a multiple attribute decision-making example applied to outsourced software project risk assessment is given to demonstrate the application of this multiple attribute decision making method. The simulation results show that two-dimensional operator with IVIFS information is more effective than three-dimensional operator.

Zhen-hua Zhang, Yong Hu, Zhao Chen, Shenguo Yuan, Kui-xi Xiao

Formal Analysis of HTM Spatial Pooler Performance Under Predefined Operation Conditions

This paper introduces mathematical formalism for Spatial Pooler (SP) of Hierarchical Temporal Memory (HTM) with a spacial consideration for its hardware implementation. Performance of HTM network and its ability to learn and adjust to a problem at hand is governed by a large set of parameters. Most of parameters are codependent which makes creating efficient HTM-based solutions challenging. It requires profound knowledge of the settings and their impact on the performance of system. Consequently, this paper introduced a set of formulas which are to facilitate the design process by enhancing tedious trial-and-error method with a tool for choosing initial parameters which enable quick learning convergence. This is especially important in hardware implementations which are constrained by the limited resources of a platform.

Marcin Pietroń, Maciej Wielgosz, Kazimierz Wiatr

Performance Comparison to a Classification Problem by the Second Method of Quantification and STRIM

STRIM is used for inducing if-then rules hidden behind a database called the decision table. Meanwhile, the second method of quantification is also often used as a method for summarizing and arranging such a database. This paper first summarizes both methods, next compares their performance in a learning and classification problem by applying them to a simulation dataset, and lastly considers features and clarifies differences of both methods based on the simulation results.

Yuya Kitazaki, Tetsuro Saeki, Yuichi Kato

Outlier Detection and Elimination in Stream Data – An Experimental Approach

In the paper the issue of outlier detection and substitution (correction) in stream data is raised. The previous research showed that even a small number of outliers in the data influences the prediction model application quality in a significant way. In this paper we try to find a proper complex method of outliers proceeding for stream data. The procedure consists of a method of outlier detection, a statistic used for the outstanding values replacement, a historic horizon for the replacing value calculation. To find the best strategy, a wide grid of experiments were prepared. All experiments were performed on semi–artificial data: data coming from the underground coal mining environment with an artificially introduced dependent variable and randomly introduced outliers. In the paper a new approach for the local outlier correction is presented, that in several cases improved the classification quality.

Mateusz Kalisch, Marcin Michalak, Piotr Przystałka, Marek Sikora, Łukasz Wróbel

Ranking and Clustering

Frontmatter

Exploiting Group Pairwise Preference Influences for Recommendations

Recommender systems always recommend items to a user based on predicted ratings. However, due to biases of different users, it is not easy to know a user’s preference through the predicted ratings. This paper defines a user preference relationship based on the user’s ratings to improve the recommendation accuracy. By considering group information, we extend the preference relationship to form four types of correlations including (user, item), (user group, item), (user, item group), and (user group, item group). And then, this paper exploits pair-wise comparisons between two items or two group of items for a singer user or a group of users. The gradient descent algorithm is used to learn latent factors on partial orders to make recommendations. Experimental results show the effectiveness of the proposed method.

Kunlei Zhu, Jiajin Huang, Ning Zhong

Discrete Group Search Optimizer for Community Detection in Social Networks

Discovering community structure in complex networks has been intensively investigated in recent years. Community detection can be treated as an optimization problem in which an objective fitness function is optimized. Intuitively, the objective fitness function captures the subgraphs in the network that has densely connected nodes with sparse connections between subgraphs. In this paper, we propose Discrete Group Search Optimizer (DGSO) which is an efficient optimization algorithm to solve the community detection problem without any prior knowledge about the number of communities. The proposed DGSO algorithm adopts the locus-based adjacency representation and several discrete operators. Experiments in real life networks show the capability of the proposed algorithm to successfully detect the structure hidden within complex networks compared with other high performance algorithms in the literature.

Moustafa Mahmoud Ahmed, Mohamed M. Elwakil, Aboul Ella Hassanien, Ehab Hassanien

Social Web Videos Clustering Based on Ensemble Technique

Currently, a massive amount of videos has become a challenging research area for social web videos mining. Clustering ensemble is a common approach to clustering problems, which combine a collection of clusterings into a superior solution. Textual features are widely used to describe a web video. Whereas, local and global features also have their own advantages to describe a web video as well. So we extract the local and global features as we called low-level/semantic features and high-level/visual features respectively to help to better describe a main source. In this paper, we propose a combining function of three similarity models to enhance the similarity values of videos, and then present a framework for Clustering Ensemble with the support of Must-Link constraint (CE-ML) to formulate in ensembling for clustering purposes. Experimental evaluation on the real world social web video has been performed to validate the proposed framework.

Vinath Mekthanavanh, Tianrui Li

Learning Latent Features for Multi-view Clustering Based on NMF

Multi-view data coming from multiple ways or being presented in multiple forms, have more information than single-view data. So multi-view clustering benefits from exploiting the more information. Nonnegative matrix factorization (NMF) is an efficient method to learn low-rank approximation of nonnegative matrix of nonnegative data, but it may not be good at clustering. This paper presents a novel multi-view clustering algorithm (called MVCS) which properly combines the similarity and NMF. It aims to obtain latent features shared by multiple views with factorizations, which is a common factor matrix attained from the views and the common similarity matrix. Besides, according to the reconstruction precisions of data matrices, MVCS could adaptively learn the weight. Experiments on real-world data sets demonstrate that our approach may effectively facilitate multi-view clustering and induce superior clustering results.

Mengjiao He, Yan Yang, Hongjun Wang

A Semantic Overlapping Clustering Algorithm for Analyzing Short-Texts

The rise in volumes of digitized short-texts like tweets or customer complaints and opinions about products and services pose new challenges to the established methods of text analytics both due to the sparseness of text and noise. In this paper we present a new semantic clustering algorithm, which first discovers frequently occurring semantic concepts within a repository, and then clusters the documents around these concepts based on concept distribution within them. The method produces overlapping clusters which generates far more accurate view of content embedded within real-life communication texts. We have compared the clustering results with LSH based clustering and show that the proposed method produces fewer overall clusters with more semantic coherence within a cluster.

Lipika Dey, Kunal Ranjan, Ishan Verma, Abir Naskar

Derivation and Application of Rules and Trees

Frontmatter

On Behavior of Statistical Indices in Incremental Context

This paper illustrates behavior of statistical indices for rule induction when an additional example is input in an incremental way by using accuracy, coverage and lift. Whereas accuracy and coverage behave monotonically, lift may behave so if some additional constraints are satisfied, of which have to be taken care for incremental rule induction.

Shusaku Tsumoto, Shoji Hirano

Linguistic Variables Construction in Information Table

Application of attribute-oriented generalization to an information often lead to inconsistent results of rule induction, which can be viewed as generation of fuzziness with partialization of attribute information. This paper focuses on fuzzy linguistic variables and proposes a solution for inconsistencies. The results show that domain ontology may play an important role in construction of linguistic variables.

Shusaku Tsumoto, Shoji Hirano

Multi-objective Search for Comprehensible Rule Ensembles

We present a methodology for constructing an ensemble of rule base classifiers characterized not only by a good accuracy of classification but also by a good quality of knowledge representation. The base classifiers forming the ensemble are composed of minimal sets of rules that cover training objects, while being relevant for their high support, low anti-support and high Bayesian confirmation measure. The population of base classifiers is evolving in course of a bi-objective optimization procedure that involves accuracy of classification and diversity of base classifiers. The final population constitutes an ensemble classifier enjoying some desirable properties, as shown in a computational experiment.

Jerzy Błaszczyński, Bartosz Prusak, Roman Słowiński

On NIS-Apriori Based Data Mining in SQL

We have proposed a framework of Rough Non-deterministic Information Analysis (RNIA) for tables with non-deterministic information, and applied RNIA to analyzing tables with uncertainty. We have also developed the RNIA software tool in Prolog and getRNIA in Python, in addition to these two tools we newly consider the RNIA software tool in SQL for handling large size data sets. This paper reports the current state of the prototype named NIS-Apriori in SQL, which will afford us more convenient environment for data analysis.

Hiroshi Sakai, Chenxi Liu, Xiaoxin Zhu, Michinori Nakata

Classification for Inconsistent Decision Tables

Decision trees have been used widely to discover patterns from consistent data set. But if the data set is inconsistent, where there are groups of examples with equal values of conditional attributes but different labels, then to discover the essential patterns or knowledge from the data set is challenging. Three approaches (generalized, most common and many-valued decision) have been considered to handle such inconsistency. The decision tree model has been used to compare the classification results among three approaches. Many-valued decision approach outperforms other approaches, and $$ M\_ws\_entM $$ greedy algorithm gives faster and better prediction accuracy.

Mohammad Azad, Mikhail Moshkov

Derivation and Application of Feature Subsets

Frontmatter

Feature Selection in Decision Systems with Constraints

In the paper we discuss an attribute reduction problem for a decision system with constraints. We present a new concept of decision system with constraints and a concept of constrained reduct defined for decision system with constraints. We define the problem of feature reduction for such constrained system and propose some heuristics or constrained reduct calculation and feature selection. We illustrate possible benefits of the proposed approach with an example based on the stream data obtained from sensor arrays in a coal mines.

Sinh Hoa Nguyen, Marcin Szczuka

Governance of the Redundancy in the Feature Selection Based on Rough Sets’ Reducts

In this paper we introduced a novel approach to feature selection based on the theory of rough sets. We defined the concept of redundant reducts, whereby data analysts can limit the size of data and control the level of redundancy in generated subsets of attributes while maintaining the discernibility of all objects even in the case of partial data loss. What more, in the article we provide the analysis of the computational complexity and the proof of NP-hardness of the n-redundant super-reduct problem.

Marek Grzegorowski

Attribute Reduction in Multi-source Decision Systems

Data processing for information from different sources is a hot research topic in the contemporary data. Attribute reduction methods of multi-source decision systems (MSDS) are proposed in this paper. Firstly, based on the integrity of original effective information preservation, a consistent attribute reduction of the multi-source decision system is proposed. Secondly, in the case of a certain loss of original effective information, data is compressed by the fusion of conditional entropy. Then attribute reduction preserving knowledge unchanged are studied in the decision system obtained by fusion. Accordingly, examples are introduced to further elaborate the theory proposed in this paper.

Yanting Guo, Weihua Xu

Matrix Algorithm for Distribution Reduction in Inconsistent Ordered Information Systems

As one part of some work in ordered information systems, distribution reduction is studied in inconsistent ordered information systems. The dominance matrix is restated for reduction acquisition in dominance relations based on information systems. Matrix algorithm is stepped for distribution reduction acquisition. And program is implemented by the algorithm. The approach provides an effective tool to the theoretical research and applications for ordered information systems in practices. Cases about detailed and valid illustrations are employed to explain and verify the algorithm and the program which shows the effectiveness of the algorithm in complicated information systems.

Xiaoyan Zhang, Ling Wei

Consistency Driven Feature Subspace Aggregating for Ordinal Classification

We present a new method for constructing an ensemble classifier for ordinal classification with monotonicity constraints. Ordinal consistency driven feature subspace aggregating (coFeating) constructs local component classification models instead of global ones, which are more common in ensemble methods. The training classification data are first structured using Variable Consistency Dominance-based Rough Set Approach (VC-DRSA). Then, coFeating constructs local classification models in subregions of the attribute space, which is divided with respect to consistency of objects. Our empirical evaluation shows that coFeating performs significantly better than previously proposed ensemble methods on data characterized by a high number of objects and/or attributes.

Jerzy Błaszczyński, Jerzy Stefanowski, Roman Słowiński

Springer Professional

About this book

Table of Contents

Frontmatter

Keynotes, Tutorials and Expert Papers

Frontmatter

Advances in Rough and Soft Clustering: Meta-Clustering, Dynamic Clustering, Data-Stream Clustering

Clinical Narrative Analytics Challenges

Modern ICT and Mechatronic Systems in Contemporary Mining Industry

Rough Sets of Zdzisław Pawlak Give New Life to Old Concepts. A Personal View ....

Rough Set Approaches to Imprecise Modeling

Rule Set Complexity for Incomplete Data Sets with Many Attribute-Concept Values and “Do Not Care” Conditions

Rough Sets, Approximation and Granulation

Frontmatter

Rough Sets Are -Accessible

Refinements of Orthopairs and IUML-algebras

Rough Approximations Induced by Orthocomplementations in Formal Contexts

On Optimal Approximations of Arbitrary Relations by Partial Orders

On Approximation of Relations by Generalized Closures and Generalized Kernels

Ordered Information Systems and Graph Granulation

Rough Sets, Non-Determinism and Incompleteness

Frontmatter

A Rough Perspective on Information in Extensive Form Games

Towards Coordination Game Formulation in Game-Theoretic Rough Sets

Probabilistic Estimation for Generalized Rough Modus Ponens and Rough Modus Tollens

Definability in Incomplete Information Tables

Rough Sets by Indiscernibility Relations in Data Sets Containing Possibilistic Information

Matrix-Based Rough Set Approach for Dynamic Probabilistic Set-Valued Information Systems

Rough Sets and Three-way Decisions

Frontmatter

Variance Based Determination of Three-Way Decisions Using Probabilistic Rough Sets

Utilizing DTRS for Imbalanced Text Classification

A Three-Way Decision Clustering Approach for High Dimensional Data

Three-Way Decisions Based Multi-label Learning Algorithm with Label Dependency

A Decision-Theoretic Rough Set Approach to Multi-class Cost-Sensitive Classification

Research on Cost-Sensitive Method for Boundary Region in Three-Way Decision Model

Determining Thresholds in Three-Way Decisions with Chi-Square Statistic

Optimistic Decision-Theoretic Rough Sets in Multi-covering Space

Fuzziness and Similarity in Knowledge Representation

Frontmatter

Interpretations of Lower Approximations in Inclusion Degrees

Multigranulation Rough Sets in Hesitant Fuzzy Linguistic Information Systems

Multi-granularity Similarity Measure of Cloud Concept

Multi-adjoint Concept Lattices, Preferences and Bousi Prolog

Modified Generalized Weighted Fuzzy Petri Net in Intuitionistic Fuzzy Environment

Machine Learning and Decision Making

Frontmatter

Similarity-Based Classification with Dominance-Based Decision Rules

Representative-Based Active Learning with Max-Min Distance

Fuzzy Multi-label Classification of Customer Complaint Logs Under Noisy Environment

Some Weighted Ranking Operators with Interval Valued Intuitionistic Fuzzy Information Applied to Outsourced Software Project Risk Assessment

Formal Analysis of HTM Spatial Pooler Performance Under Predefined Operation Conditions

Performance Comparison to a Classification Problem by the Second Method of Quantification and STRIM

Outlier Detection and Elimination in Stream Data – An Experimental Approach

Ranking and Clustering

Frontmatter

Exploiting Group Pairwise Preference Influences for Recommendations

Discrete Group Search Optimizer for Community Detection in Social Networks

Social Web Videos Clustering Based on Ensemble Technique

Learning Latent Features for Multi-view Clustering Based on NMF

A Semantic Overlapping Clustering Algorithm for Analyzing Short-Texts

Derivation and Application of Rules and Trees

Frontmatter

On Behavior of Statistical Indices in Incremental Context

Linguistic Variables Construction in Information Table

Multi-objective Search for Comprehensible Rule Ensembles

On NIS-Apriori Based Data Mining in SQL

Classification for Inconsistent Decision Tables

Derivation and Application of Feature Subsets

Frontmatter

Feature Selection in Decision Systems with Constraints

Governance of the Redundancy in the Feature Selection Based on Rough Sets’ Reducts

Attribute Reduction in Multi-source Decision Systems

Matrix Algorithm for Distribution Reduction in Inconsistent Ordered Information Systems

Consistency Driven Feature Subspace Aggregating for Ordinal Classification

Backmatter

Premium Partner