Skip to main content
main-content

Über dieses Buch

This book constitutes the refereed conference proceedings of the 15th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, RSFDGrC 2015, held in Tianjin, China in November 2015 as one of the co-located conference of the 2015 Joint Rough Set Symposium, JRS 2015.

The 44 papers were carefully reviewed and selected from 97 submissions. The papers in this volume cover topics such as rough sets: the experts speak; generalized rough sets; rough sets and graphs; rough and fuzzy hybridization; granular computing; data mining and machine learning; three-way decisions; IJCRS 2015 data challenge.

Inhaltsverzeichnis

Frontmatter

Correction to: Building Granular Systems - from Concepts to Applications

The acknowledgement section of this paper originally referred to grant DEC-2013/09/B/ST6/01568. The reference to this grant has been removed from the acknowledgement section at the request of one of the authors.

Marcin Szczuka, Andrzej Jankowski, Andrzej Skowron, Dominik Ślęzak

Rough Sets: The Experts Speak

Frontmatter

Decision-Oriented Rough Set Methods

Rough set theory is a very effective multi-attribute decision analysis tool. The paper reviews four decision-oriented rough set models and methods: dominance-based rough set, three-way decisions, multigranulation decision-theoretic rough set and rough set based multi-attribute group decision-making model. We also introduce some of our group’s works under these four models. Several future research directions of decision-oriented rough sets are presented in the end of the paper.

Jiye Liang

On Generalized Decision Functions: Reducts, Networks and Ensembles

We summarize our observations on utilizing generalized decision functions to define dependencies between attributes in decision systems. We refer to well-known criteria for attribute selection and less-known results linking generalized decisions with the notions of multivalued dependency and conditional independence. We formulate the problem of finding the simplest ensembles of subsets of attributes which allow to retrieve original decision values of considered objects by intersecting the sets of possible decisions induced by particular attributes.

Dominik Ślęzak

Formalization of Medical Diagnostic Rules

This paper dicusses formalization of medical diagnostic rules which is closely related with rough set rule model. The important point is that medical diagnostic reasoning is characterized by focusing mechanism, composed of screening and differential diagnosis, which corresponds to upper approximation and lower approximation of a target concept. Furthermore, this paper focuses on detection of complications, which can be viewed as relations between rules of different diseases.

Shusaku Tsumoto, Shoji Hirano

Multi-granularity Intelligent Information Processing

Multi-granularity thinking, computation and problem solving are effective approaches for human being to deal with complex and difficult problems. Deep learning, as a successful example model of multi-granularity computation, has made significant progress in the fields of face recognition, image automatic labeling, speech recognition, and so on. Its idea can be generalized as a model of solving problems by joint computing on multi-granular information/knowledge representation (MGrIKR) in the perspective of granular computing (GrC). This paper introduces our research on constructing MGrIKR from original datasets and its application in big data processing. Firstly, we have a survey about the study of the multi-granular computing (MGrC), including the four major theoretical models (rough sets, fuzzy sets, quotient space,and cloud model) for MGrC. Then we introduce the five representative methods for constructing MGrIKR based on rough sets, computing with words(CW), fuzzy quotient space based on information entropy, adaptive Gaussian cloud transformation (A-GCT), and multi-granularity clustering based on density peaks, respectively. At last we present an MGrC based big data processing framework, in which MGrIKR is built and taken as the input of other machine learning and data mining algorithms.

Guoyin Wang, Ji Xu, Qinghua Zhang, Yuchao Liu

Granular Structures Induced by Interval Sets and Rough Sets

An interval set is a family of sets restricted by a upper bound and lower bound. Interval-set algebras are concrete models of granular computing. The triarchic theory of granular computing focuses on a multilevel and multi-view granular structure. This paper discusses granular structures of interval sets under inclusion relations between two interval sets from a measurement-theoretic perspective and set-theoretic perspective, respectively. From a measurement-theoretic perspective, this paper discusses preferences on two objects represented by interval sets under inclusion relations on interval sets. From a set-theoretic perspective, this paper uses different inclusion relations and operations on interval sets to construct multilevel and multi-view granular structures.

Ning Zhong, Jia-jin Huang

Generalized Rough Sets

Frontmatter

Empirical Risk Minimization for Variable Consistency Dominance-Based Rough Set Approach

The paper concerns reasoning about partially inconsistent ordinal data. We reveal the relation between Variable Consistency Dominance-based Rough Set Approach (VC-DRSA) and the empirical risk minimization problem. VC-DRSA is an extension of DRSA that admits some degree of inconsistency with respect to dominance, which is controlled by thresholds on a consistency measure. To prove this relation, we first solve an optimization problem to find thresholds ensuring assignment of a maximum number of objects under disjoint and balanced setting of extended lower approximations of two complementary unions of ordered decision classes: “at least” class i, and “at most” class $$i-1$$i-1, for a given $$i \in \{2, \ldots , p\}$$i∈{2,…,p}, where p is the total number of classes. For a given i, each object is supposed to be assigned to at most one of the two extended lower approximations. Moreover, the assignment is not influenced by unions’ cardinalities. Second, we prune the set of objects not assigned to any extended lower approximation. Then, from a definable set, for a given i, we derive a classification function, which indicates assignment of an object to one of the two unions of decision classes. We define empirical risk associated with the classification function as a hinge loss function. We prove that the classification function minimizing the empirical risk function corresponds to the extended lower approximation in VC-DRSA involving thresholds obtained from the above optimization problem, followed by the pruning.

Jerzy Błaszczyński, Yoshifumi Kusunoki, Masahiro Inuiguchi, Roman Słowiński

Rough Set Approximations in Multi-scale Interval Information Systems

With the view point of granular computing, the notion of a granule may be interpreted as one of the numerous small particles forming a larger unit. There are different granules at different levels of scale in data sets having hierarchical structures. Human beings often observe objects or deal with data hierarchically structured at different levels of granulations. And in real-world applications, there may exist multiple types of data in interval information systems. Therefore, the concept of multi-scale interval information systems is first introduced in this paper. The lower and upper approximations in multi-scale interval information systems are then defined, and the accuracy and the roughness are also explored. Monotonic properties of these rough set approximations with different levels of granulations are analyzed with illustrative examples.

Shen-Ming Gu, Ya-Hong Wan, Wei-Zhi Wu, Tong-Jun Li

A New Subsystem-Based Definition of Generalized Rough Set Model

The generalization of Pawlak rough set model always attracts the attentions of the researchers in the rough set society. In this paper, we propose a new subsystem-based definition of generalized rough set model and disclose the corresponding properties. We also discuss the interrelationships between our definition and the existing ones, the outputs show that our definition is effective and reasonable.

Caihui Liu, Meizhi Wang, Yanfei Dai, Yueli Luo

A Comparison of Two Types of Covering-Based Rough Sets Through the Complement of Coverings

In recent years, many types of covering-based rough set models were established and the study of their relationships was a hot research topic. Covering-based rough sets defined by Zhu and ones defined by Xu and Zhang were compared with each other through binary relations. And the relationships between these two types were further explored by introducing a concept of complementary neighborhood. However, the essential connections between these two types have not been revealed. In this paper, we consider these two types of covering-based rough sets by introducing a notion of the complement of coverings. In fact, these two types are expressed by each other through the complement of coverings. Based on the above results, we analyze a notion of the extension of a covering, which is introduced on the basis of the complement of the covering. Finally, we further explore the structure of these types of covering-based rough set models. This study suggests some internal connections between covering-based rough sets and demonstrates a new research tendency of them.

Yanfang Liu, William Zhu

On the Nearness Measures of Near Sets

In this paper, we focus our discussion on the nearness measures of near set approach. Some existing nearness measure is normalized with its basic properties being discussed. The connections between nearness relations and rough approximations are surveyed. The notions of strong nearness relations with respect to indiscernibility relation and weak indiscernibility relation are introduced. Some new nearness measures with respect to nearness relations and strong nearness relations are presented.

Keyun Qin, Bo Li

Topological Properties for Approximation Operators in Covering Based Rough Sets

We investigate properties of approximation operators being closure and topological closure in a framework of sixteen pairs of dual approximation operators, for the study of covering based rough sets. We extended previous results about approximation operators related with closure operators.

Mauricio Restrepo, Jonatan Gómez

Rough Sets and Graphs

Frontmatter

Preclusivity and Simple Graphs

The adjacency relation of a simple undirected graph is a preclusive (irreflexive and symmetric) relation. Hence, it originates a preclusive space enabling us to define the lower and upper preclusive approximations of graphs and two orthogonality graphs. Further, the possibility of defining the similarity lower and upper approximations and the sufficiency operator on graphs will be investigated, with particular attention to complete and bipartite graphs. All these mappings will be put in relation with Formal Concept Analysis and the theory of opposition.

Giampiero Chiaselotti, Davide Ciucci, Tommaso Gentile, Federico Infusino

Preclusivity and Simple Graphs: The n–cycle and n–path Cases

Two classes of graphs, the n–cycles and n–paths, are interpreted as preclusivity spaces. In this way, it is possible to define two pairs of approximations on them: one based on a preclusive relation and another one based on a similarity relation. Further, two relations can be defined among the set of vertices and they define two different graphs, which are here studied.

Giampiero Chiaselotti, Davide Ciucci, Tommaso Gentile, Federico Infusino

Connectedness of Graph and Matroid by Covering-Based Rough Sets

Covering-based rough sets provide an efficient theory to process information in data mining. Matroid theory is a generalization of both linear algebra and graph theory, and has a variety of applications in many fields, such as covering-based rough sets. In this paper, we study the connectedness of graphs and matroids through covering-based rough sets. First, we present an approach to induce a covering by a graph. Then we use the covering upper approximation operator and the rank of matrix representation of the covering to study the connectedness of the graph. Moreover, we give the expression of the number of the connected components of a graph. Second, we establish a matroid based on the covering induced by a graph and study the connectedness of this matroid.

Hui Li, William Zhu

Controllability in Directed Complex Networks: Granular Computing Perspective

Controlling complex networks to a desired state has been a widespread sense in contemporary science. Usually, we seek a maximum matching of complex networks by matching theory and control those unmatched nodes to achieve the purpose of controlling complex networks. However, for complex networks with high dimensions, it is hard to find its maximum matching or there are copious unmatched nodes that need to be controlled. Therefore, controlling complex networks is extremely strenuous. Motivated by the idea of granular computing (GrC), we take a fine graining preprocessing to the whole complex networks and obtain several different granules. Then find the maximum matching in every granule and control those unmatched nodes to procure the goal of controlling the entire network. At last, the related key problems in GrC-based controllability of complex networks processing framework are discussed.

Yunyun Yang, Gang Xie, Zehua Chen

Rough and Fuzzy Hybridization

Frontmatter

Dynamic Maintenance of Rough Fuzzy Approximations with the Variation of Objects and Attributes

In many fields including medical research, e-business and road transportation, data may vary over time, i.e., new objects and new attributes are added. In this paper, we present a method for dynamically updating approximations based on rough fuzzy sets under the variation of objects and attributes simultaneously in fuzzy decision systems. Firstly, a matrix-based approach is proposed to construct the rough fuzzy approximations on the basis of relation matrix. Then the method for incrementally computing approximations is presented, which involves the partition of the relation matrix and partly changes its element values based the prior matrices’ information. Finally, an illustrative example is employed to validate the effectiveness of the proposed method.

Yanyong Huang, Tianrui Li, Shi-jinn Horng

Semi-Supervised Fuzzy-Rough Feature Selection

With the continued and relentless growth in dataset sizes in recent times, feature or attribute selection has become a necessary step in tackling the resultant intractability. Indeed, as the number of dimensions increases, the number of corresponding data instances required in order to generate accurate models increases exponentially. Fuzzy-rough set-based feature selection techniques offer great flexibility when dealing with real-valued and noisy data; however, most of the current approaches focus on the supervised domain where the data object labels are known. Very little work has been carried out using fuzzy-rough sets in the areas of unsupervised or semi-supervised learning. This paper proposes a novel approach for semi-supervised fuzzy-rough feature selection where the object labels in the data may only be partially present. The approach also has the appealing property that any generated subsets are also valid (super)reducts when the whole dataset is labelled. The experimental evaluation demonstrates that the proposed approach can generate stable and valid subsets even when up to 90 % of the data object labels are missing.

Richard Jensen, Sarah Vluymans, Neil Mac Parthaláin, Chris Cornelis, Yvan Saeys

Modified Generalised Fuzzy Petri Nets for Rule-Based Systems

In [10], the generalised fuzzy Petri nets were proposed. This class extends the existing fuzzy Petri nets by introducing three input/output operators in the form of triangular norms, which are supposed to function as substitute for the classical min, max, and * (the algebraic product) operators. In this paper, we describe so called modified generalised fuzzy Petri nets. A functional interpretation of transitions based on inverted fuzzy implications is added to the model. The proposed net model is not only more comfortable in terms of knowledge representation, but most of all it is more effective in the modelling process of approximate reasoning as in the new class of fuzzy Petri nets the user has the chance to define both the input/output operators as well as transition operators. To demonstrate the power and the usefulness of this model, an application of the modified generalised fuzzy Petri nets in the domain of train traffic control is provided. The proposed approach can be used for knowledge representation and reasoning in decision support systems.

Zbigniew Suraj

Fuzzy Rough Decision Trees for Multi-label Classification

Multi-label classification exists widely in medical analysis or image annotation. Although there are some algorithms to train models for multi-label classification, few of them are able to extract comprehensible rules. In this paper, we propose a multi-label decision tree algorithm based on fuzzy rough sets, named ML-FRDT. This method can tackle with symbolic, continuous and fuzzy data. We conduct experiments on two multi-label datasets. And the experiment results show that ML-FRDT achieves good performance than some well-established multi-label classification algorithms.

Xiaoxue Wang, Shuang An, Hong Shi, Qinghua Hu

Axiomatic Characterizations of Reflexive and $$\mathcal {T}$$ T -Transitive $$\mathcal {I}$$ I -Intuitionistic Fuzzy Rough Approximation Operators

Axiomatic characterizations of approximation operators are important in the study of rough set theory. In this paper, axiomatic characterizations of relation-based intuitionistic fuzzy rough approximation operators determined by an intuitionistic fuzzy implication operator $$\mathcal{I}$$I are investigated. We present a set of axioms of lower/upper $$\mathcal{I}$$I-intuitionistic fuzzy set-theoretic operator which is necessary and sufficient for the existence of an intuitionistic fuzzy relation producing the same operator. We show that the lower and upper $$\mathcal{I}$$I-intuitionistic fuzzy rough approximation operators generated by an arbitrary intuitionistic fuzzy relation can be described by single axioms. Moreover, the $$\mathcal{I}$$I-intuitionistic fuzzy rough approximation operators generated by reflexive and $$\mathcal{T}$$T-transitive intuitionistic fuzzy relations can also be characterized by single axioms.

Wei-Zhi Wu, You-Hong Xu, Tong-Jun Li, Xia Wang

Granular Computing

Frontmatter

Knowledge Supported Refinements for Rough Granular Computing: A Case of Life Insurance Industry

Dominance-based rough set approach (DRSA) has been adopted in solving various multiple criteria classification problems with positive outcomes; its advantage in exploring imprecise and vague patterns is especially useful concerning the complexity of certain financial problems in business environment. Although DRSA may directly process the raw figures of data for classifications, the obtained decision rules (i.e., knowledge) would not be close to how domain experts comprehend those knowledge—composed of granules of concepts—without appropriate or suitable discretization of the attributes in practice. As a result, this study proposes a hybrid approach, composes of DRSA and a multiple attributes decision method, to search for suitable approximation spaces of attributes for gaining applicable knowledge for decision makers (DMs). To illustrate the proposed idea, a case of life insurance industry in Taiwan is analyzed with certain initial experiments. The result not only improves the classification accuracy of the DRSA model, but also contributes to the understanding of financial patterns in the life insurance industry.

Kao-Yi Shen, Gwo-Hshiung Tzeng

Building Granular Systems - from Concepts to Applications

Granular Computing (GrC) is a domain of science aiming at modeling computations and reasoning that deals with imprecision, vagueness and incompleteness of information. Computations in GrC are performed on granules which are obtained as a result of information granulation. Principal issues in GrC concern processes of representation, construction, transformation and evaluation of granules. It also requires aligning with some of the fundamental computational issues concerning, e.g., interaction and adaptation. The paper outlines the current status of GrC and provides the general overview of the process of building granular solutions to challenges posed by various real-life problems involving granularity. It discusses the steps that lead from raw data and imprecise/vague specification towards a complete, useful application of granular paradigm.

Marcin Szczuka, Andrzej Jankowski, Andrzej Skowron, Dominik Ślęzak

The Rough Granular Approach to Classifier Synthesis by Means of SVM

In this work we exploit the effects of applying methods for constructions of granular reflections of decision systems developed up to now in the framework of rough mereology, along with kernel methods for the building of classifiers. In this preliminary report we present results obtained with the SVM classification with use of the RBF kernel. The approximation metod we use is the optimized $$\varepsilon $$ε concept dependent granulation. We experimentally verify the validity of this new approach with test data: Wisconsin Diagnostic Breast Cancer, Fertility Diagnosis, Parkinson Disease and the Prognostic Wisconsin Breast Cancer Database. The results are very promising as the obtained accuracy is not diminished but the size of the granular decision system is radically diminished.

Jacek Szypulski, Piotr Artiemjew

Data Mining and Machine Learning

Frontmatter

The Boosting and Bootstrap Ensemble for Classifiers Based on Weak Rough Inclusions

In the recent works we have investigated the classifiers based on weak rough inclusions, especially the 8v1.1 - 8v1.5 algorithms. These algorithms in process of weights forming for classification dynamically react on the distance between the particular attributes. Our results show the effectiveness of these methods and the wide application in many contexts, especially in the context of classification of DNA Microarray data. In this work we have checked a few methods for classifier stabilisation, such as the Bootstrap Ensemble, Boosting based on Arcing, and Ada-Boost with Monte Carlo split. We have performed experiments on selected data from the UCI Repository. The results show that the committee of weak classifiers stabilised our algorithms in the context of accuracy of classification. The Boosting based on Arcing turned out to be the most promising method among those examined.

Piotr Artiemjew

Extraction of Off-Line Handwritten Characters Based on a Soft K-Segments for Principal Curves

Principal curves are nonlinear generalizations of principal components analysis. They are smooth self-consistent curves that pass through the middle of the distribution. By analysis of existed principal curves, we learn that a soft k-segments algorithm for principal curves exhibits good performance in such situations in which the data sets are concentrated around a highly curved or self-intersecting curves. Extraction of features are critical to improve the recognition rate of off-line handwritten characters. Therefore, we attempt to use the algorithm to extract structural features of off-line handwritten characters. Experiment results show that the algorithm is not only feasible for extraction of structural features of characters, but also exhibits good performance. The proposed method can provide a new approach to the research for extraction of structural features of characters.

Na Jiao

A Knowledge Acquisition Model Based on Formal Concept Analysis in Complex Information Systems

Normally, in some complex information systems, the binary relation on domain of any attribute is just a kind of ordinary binary, which does not meet some common properties such as reflexivity, transitivity or symmetry. In view of the above-mentioned facts this paper attempts to employ FCA(Formal Concept Analysis), proposes a rough set model based on FCA, in which equivalence relations, dominance relations, similarity relations(or tolerance relations) and neighborhood relations on universe are expanded to general binary relations and problems in rough set theory are discussed based on FCA. Particularly, from the above description of complex information systems, we can see that the relation in domain of any attribute may be extremely complex, which often leads to high time complexity and space complexity in the process of knowledge acquisition. For above reason this paper introduces granular computing(GrC), which can effectively reduce the complexity to a certain extent.

Xiangping Kang, Duoqian Miao, Na Jiao

The Borda Count, the Intersection and the Highest Rank Method in a Dispersed Decision-Making System

The main aim of the article is to compare the results obtained using three different methods of conflict analysis in a dispersed decision-making system. The conflict analysis methods, used in the article, are discussed in the paper of Ho, Hull and Srihari [6] and in the book of Black [2]. All these methods are used if the individual classifiers generate rankings of classes instead of unique class choices. The first method is the Borda count method, which is a generalization of the majority vote. The second is the intersection method, which belong to the class set reduction method. The third one is the highest rank method, which belong to the methods for class set reordering. All of these methods were used in a dispersed decision-making system which was proposed in the paper [12].

Małgorzata Przybyła-Kasperek

An Ensemble Learning Approach Based on Missing-Valued Tables

In classification problems on rough sets, the effectiveness of ensemble learning approaches such as bagging, random forests, and attribute sampling ensemble has been reported. We focus on occurrences of deficiencies in columns on the original decision table in random forests and attribute sampling ensemble approaches. In this paper, we generalize such deficiencies of columns to deficiencies of cells and propose an ensemble learning approach based on missing-valued decision tables. We confirmed the effectiveness of the proposed method for the classification performance through numerical experiments and the two-tailed Wilcoxon signed-rank test. Furthermore, we consider the robustness of the method in absences of condition attribute values of unknown objects.

Seiki Ubukata, Taro Miyazaki, Akira Notsu, Katsuhiro Honda, Masahiro Inuiguchi

Multiple-Side Multiple-Learner for Incomplete Data Classification

Selective classifier can improve classification accuracy and algorithm efficiency by removing the irrelevant attributes of data. However, most of them deal with complete data. Actual datasets are often incomplete due to various reasons. Incomplete dataset also have some irrelevant attributes which have a negative effect on the algorithm performance. By analyzing main classification methods of incomplete data, this paper proposes a Multiple-side Multiple-learner algorithm for incomplete data (MSML). MSML first obtains a feature subset of the original incomplete dataset based on the chi-square statistic. And then, according to the missing attribute values of the selected feature subset, MSML obtains a group of data subsets. Each data subset was used to train a sub classifier based on bagging algorithm. Finally, the results of different sub classifiers were combined by weighted majority voting. Experimental results on UCI incomplete datasets show that MSML can effectively reduce the number of attributes, and thus improve the algorithm execution efficiency. At the same time, it can improve the classification accuracy and algorithm stability too.

Yuan-ting Yan, Yan-Ping Zhang, Xiu-Quan Du

Sparse Matrix Feature Selection in Multi-label Learning

High-dimensional data are commonly met in multi-label learning, and dimensionality reduction is an important and challenging work. In this paper, we propose sparse matrix feature selection to reduce data dimension in multi-label learning. First, the feature selection problem is formalized by sparse matrix. Second, an sparse matrix feature selection algorithm is proposed. Third, four feature selection are compared with the proposed methods and parameter optimization analysis is also provide. Experiments reported the proposed algorithms outperform the other methods in most cases of tested datasets.

Wenyuan Yang, Bufang Zhou, William Zhu

A Classification Method for Imbalanced Data Based on SMOTE and Fuzzy Rough Nearest Neighbor Algorithm

FRNN (Fuzzy Rough Nearest Neighbor) algorithm has exhibited good performance in classifying data with inadequate features. However, FRNN does not perform well on imbalanced data. To overcome this problem, this paper introduces a combination method. An improved SMOTE method is adopted to balance data and FRNN is applied as the classification method. Experiments show that the combination method can obtain a better result rather than classical FRNN algorithm.

Weibin Zhao, Mengting Xu, Xiuyi Jia, Lin Shang

Three-Way Decisions

Frontmatter

Region Vector Based Attribute Reducts in Decision-Theoretic Rough Sets

When removing some attributes, the partition induced by a smaller set of attributes will be coarser and the decision regions may be changed. In this paper, we analyze the decision region changes when removing attributes and propose a new type of attribute reducts from the point of view of vector based three-way approximations of a partition. We also present a reduct construction method by using a discernibility matrix.

Guoshun Huang, Yiyu Yao

How to Evaluate Three-Way Decisions Based Binary Classification?

Appropriate measures are important for evaluating the performance of a classifier. In existing studies, many performance measures designed for two-way decisions based classification are applied to three-way decisions based classification directly, which may result in an incomprehensive evaluation. However, there is a lack of systematically research on the performance measures for three-way decisions based classification. This paper introduces some numerical measures and graphical measures for three-way decisions based binary classification.

Xiuyi Jia, Lin Shang

A Moderate Attribute Reduction Approach in Decision-Theoretic Rough Set

Attribute reduction is an important topic in Decision-Theoretic Rough Set theory. To overcome the limitations of lower-approximation-monotonicity based reduct and cost minimum based reduct, a moderate attribute reduction approach is proposed in this paper, which combines the lower approximation monotonicity criterion and cost minor criterion. Furthermore, the proposed attribute reduct is searched by solving an optimization problem, and a fusion fitness function is proposed in a generic algorithm, such that the reduct is computed in a low time complexity. Experimental analysis is included to validate the theoretic analysis and quantify the effectiveness of the proposed attribute reduction algorithm. This study indicates that the optimality is not the best and sub-optimum may be the best choice.

Hengrong Ju, Xibei Yang, Pei Yang, Huaxiong Li, Xianzhong Zhou

Decisions Tree Learning Method Based on Three-Way Decisions

Aiming at the problems that traditional data mining methods ignore inconsistent data, and general decision tree learning algorithms lack of theoretical support for the classification of inconsistent nodes. The three-way decision is introduced to decision tree learning algorithms,and the decision tree learning method based on three-way decisions is proposed. Firstly, the proportion of positive objects in node is used to compute the conditional probability of the three-way decision of node. Secondly, the nodes in decision tree arepartitioned to generate the three-way decision tree. The merger and pruning rules of the three-way decision tree are derived to convert the three-way decision tree into two-way decision tree by considering the information around nodes. Finally, an exampleisimplemented. The results show that the proposed method reserves inconsistent information, partitions inconsistent nodes by minimizing the overall risk, not only generates decision tree with cost-sensitivity, but also makes the partition of inconsistent nodes more explicable. Besides, the proposed method reduces the overfitting to some extent and the computation problem of conditional probability of three-way decisions is resolved.

Yangyang Liu, Jiucheng Xu, Lin Sun, Lina Du

Multi-decision-makers-based Monotonic Variable Consistency Rough Set Approach with Multiple Attributes and Criteria

The paper separates decision expression system into three parts: the relation system, the decision-making system and the causal system by the perspective of Pansystems theory. In these three separated systems, the extended approach involves multiple types of attributes and many decision-makers, and it aims at modelling data expressed by monotonic variable consistency measures. Furthermore, the two referred thresholds, according to Bayes decision procedure that is applied by Decision Theoretic Rough Set, can be calculated directly. So the paper proposes Multi-decision-makers-based Monotonic Variable Consistency Rough Set Approach with Multiple Attributes and Criteria, and its properties are proposed and proved.

Wenbin Pei, He Lin, Li Li

Determining Three-Way Decision Regions by Combining Gini Objective Functions and GTRS

Game-theoretic rough set model (GTRS) is a recent advancement in determining decision regions by formulating competition or cooperation between multiple measures of decision regions. Different competitions can be formulated with GTRS to gain optimal and balanced decision regions. In three-way decisions, there are some remaining issues where GTRS may be employed to reach a compromise between conflicting measures. When Gini coefficient is used to measure impurity of decision regions, Gini objective functions may be formulated to optimize impurities of multiple decision regions. We aim to examine the problem of minimizing the impurities of immediate and non-commitment decision regions simultaneously. In particular, we consider using GTRS to determine three-way decision regions by finding a solution to Gini objective functions. A compromise solution from various Pareto optimal strategies is obtained with GTRS. The game formulation, Pareto optimal strategies, Nash equilibrium of games, as well as iteration learning mechanism are investigated in detail. An example to demonstrate that compromise decision regions can be obtained by using GTRS to formulate competitions between decision regions is presented.

Yan Zhang, JingTao Yao

IJCRS 2015 Data Challenge

Frontmatter

Mining Data from Coal Mines: IJCRS’15 Data Challenge

We summarize the data mining competition associated with IJCRS’15 conference – IJCRS’15 Data Challenge: Mining Data from Coal Mines, organized at Knowledge Pit web platform. The topic of this competition was related to the problem of active safety monitoring in underground corridors. In particular, the task was to design an efficient method of predicting dangerous concentrations of methane in longwalls of a Polish coal mine. We describe the scope and motivation for the competition. We also report the course of the contest and briefly discuss a few of the most interesting solutions submitted by participants. Finally, we reveal our plans for the future research within this important subject.

Andrzej Janusz, Marek Sikora, Łukasz Wróbel, Sebastian Stawicki, Marek Grzegorowski, Piotr Wojtas, Dominik Ślęzak

Prediction of Methane Outbreak in Coal Mines from Historical Sensor Data under Distribution Drift

We describe our submission to the IJCRS’15 Data Mining Competition, where the objective is to predict methane outbreaks from multiple sensor readings. Our solution exploits a selective naive Bayes classifier, with optimal preprocessing, variable selection and model averaging, together with an automatic variable construction method that builds many variables from time series records. One challenging part of the challenge is that the input variables are not independent and identically distributed (i.i.d.) between the train and test datasets, since the train data and test data rely on different time periods. We suggest a methodology to alleviate this problem, that enabled to get a final score of 0.9439 (team marcb), second among the 50 challenge competitors.

Marc Boullé

Window-Based Feature Engineering for Prediction of Methane Threats in Coal Mines

We present our results of experiments concerning the methane threats prediction in coal mines obtained during IJCRS’15 Data Challenge. The data mining competition task poses the problem of active monitoring and early threats detection which is essential to prevent spontaneous gas explosions. This issue is very important for the safety of people and equipment as well as minimization of production losses. The discussed research was conducted also to verify the effectiveness of the feature engineering framework developed in the DISESOR project. The utilized framework is based on a sliding window approach and is designed to handle numerous streams of sensor readings.

Marek Grzegorowski, Sebastian Stawicki

SVM Parameter Tuning with Grid Search and Its Impact on Reduction of Model Over-fitting

In this paper we describe our submission to the IJCRS’15 Data Mining Competition, which is concerned with prediction of dangerous concentrations of methane in longwalls of a Polish coalmine. We address the challenge of building robust classification models with support vector machines (SVMs) that are built from time series data. Moreover, we investigate the impact of parameter tuning of SVMs with grid search on the classification performance and its effect on preventing over-fitting. Our results show improvements of predictive performance with proper parameter tuning but also improved stability of the classification models even when the test data comes from a different time period and class distribution. By applying the proposed method we were able to build a classification model that predicts unseen test data even better than the training data, thus highlighting the non-over-fitting properties of the model. The submitted solution was about 2 % behind the winning solution.

Petre Lameski, Eftim Zdravevski, Riste Mingov, Andrea Kulakov

Detecting Methane Outbreaks from Time Series Data with Deep Neural Networks

Hazard monitoring systems play a key role in ensuring people’s safety. The problem of detecting dangerous levels of methane concentration in a coal mine was a subject of IJCRS’15 Data Challenge competition. The challenge was to predict, from multivariate time series data collected by sensors, if methane concentration reaches a dangerous level in the near future. In this paper we present our solution to this problem based on the ensemble of Deep Neural Networks. In particular, we focus on Recurrent Neural Networks with Long Short-Term Memory (LSTM) cells.

Krzysztof Pawłowski, Karol Kurach

Self-Organized Predictor of Methane Concentration Warnings in Coal Mines

Coal mining operation continuously balances the trade-off between the mining productivity and the risk of hazards like methane explosion. Dangerous methane concentration is normally a result of increased cutter loader workload and leads to a costly operation shutdown until the increased concentrations abate.We propose a simple yet very robust methane warning prediction model that can forecast imminent high methane concentrations at least 3 minutes in advance, thereby giving enough notice to slow the mining operation, prevent methane warning and avoid costly shutdowns.Our model is in fact an instance of the generic prediction framework able to rapidly compose a predictor of any future events upon the aligned time series big data. The model uses fast greedy backward-forward search applied subsequently upon the design choices of the machine learning model from the data granularity, feature selection, filtering and transformation up to the selection of the predictor, its configuration and complexity.We have applied such framework to the methane concentration warning prediction in real coal mines as a part of the IJCRS’2015 data mining competition and scored $$3^{rd}$$3rd place with the performance just under 85 %. Our top model emerged as a result of the rapid filtering through the large amount of sensors time series and eventually used only the latest 1 minute of aggregated data from just few sensors and the logistic regression predictor. Many other model setups harnessing multiple linear regression, decision trees, naive Bayes or support vector machine predictors on slightly altered feature sets returned nearly equally good performance.

Dymitr Ruta, Ling Cen

Prediction of Methane Outbreaks in Coal Mines from Multivariate Time Series Using Random Forest

In recent years we have experienced unprecedented increase of use of sensors in many industrial applications. Examples of such are Health and Usage Monitoring Systems (HUMS) for vehicles, so-called intelligent buildings, or instrumentation on machinery in order to monitor performance, detect faults and gain insights in operational aspects. Modern sensors are capable of not only generating large volumes of data but as well transmitting that data through network and storing it for further analysis. Unfortunately, that collected data requires further analysis in order to provide useful information to the decision makers who want to reduce costs, improve safety, etc. Such analysis proved to be a challenge, as there are no generic methodologies that allow for automating data analysis and in practice costs required to analyze data are prohibitively high for many practical applications. This paper is a step in a direction of developing generic methods for sensor data analysis – it describes an application of a generic method that can be applied to arbitrary set of multivariate time series data in order to perform classification or regression tasks. The presented application relates to prediction of methane concentrations in coal mines based on time series data from various sensors. The method was tested within the framework of IJCRS’15 data mining competition and resulted in the winning model outperforming other solutions.

Adam Zagorecki

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise