nach oben

2021 | Buch

Kapitel lesen Erstes Kapitel lesen

Rough Sets

International Joint Conference, IJCRS 2021, Bratislava, Slovakia, September 19–24, 2021, Proceedings

herausgegeben von: Sheela Ramanna, Chris Cornelis, Davide Ciucci

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The volume LNAI 12872 constitutes the proceedings of the International Joint Conference on Rough Sets, IJCRS 2021, Bratislava, Slovak Republic, in September 2021. The conference was held as a hybrid event due to the COVID-19 pandemic.

The 13 full paper and 7 short papers presented were carefully reviewed and selected from 26 submissions, along with 5 invited papers. The papers are grouped in the following topical sections: core rough set models and methods, related methods and hybridization, and areas of applications.

Inhaltsverzeichnis

Frontmatter

Invited Papers

Frontmatter

Mining Incomplete Data Using Global and Saturated Probabilistic Approximations Based on Characteristic Sets and Maximal Consistent Blocks

Abstract

In this paper we discuss incomplete data sets with missing attribute values interpreted as “do not care” conditions. For data mining, we use two types of probabilistic approximations, global and saturated. Such approximations are constructed from two types of granules, characteristic sets and maximal consistent blocks. We present results of experiments on mining incomplete data sets using four approaches, combining two types of probabilistic approximations, global and saturated, with two types of granules, characteristic sets and maximal consistent blocks. We compare these four approaches, using an error rate computed as the result of ten-fold cross validation. We show that there are significant differences (5% level of significance) between these four approaches to data mining. However, there is no universally best approach. Hence, for an incomplete data set, the best approach to data mining should be chosen by trying all four approaches.

Patrick G. Clark, Jerzy W. Grzymala-Busse, Zdzislaw S. Hippe, Teresa Mroczek

Determining Tanimoto Similarity Neighborhoods of Real-Valued Vectors by Means of the Triangle Inequality and Bounds on Lengths

Abstract

The Tanimoto similarity is widely used in chemo-informatics, biology, bio-informatics, text mining and information retrieval to determine neighborhoods of sufficiently similar objects or k most similar objects represented by real-valued vectors. For metrics such as the Euclidean distance, the triangle inequality property is often used to efficiently identify vectors that may belong to the sought neighborhood of a given vector. Nevertheless, the Tanimoto similarity as well as the Tanimoto dissimilarity do not fulfill the triangle inequality property for real-valued vectors. In spite of this, in this paper, we show that the problem of looking for a neighborhood with respect to the Tanimoto similarity among real-valued vectors is equivalent to the problem of looking for a neighborhood among normalized forms of these vectors in the Euclidean space. Based on this result, we propose a method that uses the triangle inequality to losslessly identify promising candidates for members of Tanimoto similarity neighborhoods among real-valued vectors. The method requires pre-calculation and storage of the distances from normalized forms of real-valued vectors to so called a reference vector. The normalized forms of vectors themselves do not need to be stored after the pre-calculation of these distances. We also propose two variants of a new combined method which, apart from the triangle inequality, also uses bounds on vector lengths to determine Tanimoto similarity neighborhoods. The usefulness of the new and related methods is illustrated with examples.

Marzena Kryszkiewicz

Rough-Fuzzy Segmentation of Brain MR Volumes: Applications in Tumor Detection and Malignancy Assessment

Abstract

An important diagnostic technique for providing accurate information about the spatial distribution of brain soft tissues non-invasively is magnetic resonance (MR) imaging. In MR images, different imaging artifacts give rise to uncertainties in brain volume segmentation into major soft tissue classes; as well as in extracting brain tumor and evaluating its malignancy state. Among various soft computing techniques, rough sets provide a powerful tool to handle uncertainties and incompleteness associated with data, while fuzzy set serves as an analytical tool for dealing with uncertainty that arises due to the overlapping characteristics in the data. In this regard, the paper presents a brief review on the recent advances of rough-fuzzy hybridized approaches for brain MR volume segmentation, brain tumor detection and gradation.

Pradipta Maji, Shaswati Roy

DDAE-GAN: Seismic Data Denoising by Integrating Autoencoder and Generative Adversarial Network

Abstract

Machine learning methods face two main challenges in denoising tasks. One is the lack of supervised training data, and the other is the limited knowledge of complex unknown noise. In this paper, for seismic denoising, we propose a new method with three techniques to handle them effectively. First, a Generative Adversarial Network (GAN) is employed to generate a large number of paired clean-noisy data using real noise. Second, a deep denoising autoencoder (DDAE) is pre-trained using these data. Third, a transfer learning technique is used to train the DDAE further on a few field data. We have assessed the proposed method based on qualitative and quantitative analysis. Results show that the method can suppress seismic data noise well.

Fan Min, Lin-Rong Wang, Shu-Lin Pan, Guo-Jie Song

Classification of Multi-class Imbalanced Data: Data Difficulty Factors and Selected Methods for Improving Classifiers

Abstract

The multiple class imbalanced problem is still less investigated than its binary counterpart. In particular, the sources of its difficulties have not been sufficiently studied so far. Therefore, in this paper we summarize the few literature works on the difficulty factors and present our own latest research results. The binary method for an identification of the types of minority examples is generalized for multiple imbalance classes. The second part of this paper presents three our recent methods for learning classifies from multi-class imbalanced data which exploit information on the aforementioned difficulty factors.

Jerzy Stefanowski

Core Rough Set Models and Methods

Frontmatter

General Rough Modeling of Cluster Analysis

Abstract

In this research a general theoretical framework for clustering is proposed over specific partial algebraic systems by the present author. Her theory helps in isolating minimal assumptions necessary for different concepts of clustering information in any form to be realized in a situation (and therefore in a semantics). It is well-known that of the limited number of proofs in the theory of hard and soft clustering that are known to exist, most involve statistical assumptions. Many methods seem to work because they seem to work in specific empirical practice. A new general rough method of analyzing clusterings is invented, and this opens the subject to clearer conceptions and contamination-free theoretical proofs. Numeric ideas of validation are also proposed to be replaced by those based on general rough approximation. The essential approach is explained in brief and supported by an example.

A. Mani

Possible Coverings in Incomplete Information Tables with Similarity of Values

Abstract

Rough sets are described by an approach using possible coverings in an incomplete information table with similarity of values. Lots of possible coverings are derived in an incomplete information table. This seems to cause difficulty due to computational complexity, but it is not, because the family of possible coverings has a lattice structure. Four approximations that make up a rough set are derived by using only two coverings: the minimum and maximum possible ones which are derived from the minimum and the maximum possible indiscernibility relations that are equal to the intersection and the union of those from possible tables. The approximations are equal to those derived using the minimum and the maximum possibly indiscernible classes.

Michinori Nakata, Norio Saito, Hiroshi Sakai, Takeshi Fujiwara

Attribute Reduction Using Functional Dependency Relations in Rough Set Theory

Abstract

This paper presents some functional dependency relations defined on the attribute set of an information system. We establish some basic relationships between functional dependency relations, attribute reduction, and closure operators. We use the partial order for dependencies to show that reducts of an information system can be obtained from the maximal elements of a functional dependency relation.

Mauricio Restrepo, Chris Cornelis

The RSDS: A Current State and Future Plans

Abstract

This paper provides a brief overview of the Rough Set Data-base System (the RSDS for short) for creating bibliographies on rough sets and related fields, as well as sharing and analysis. The current version of the RSDS includes a number of modifications, extensions and functional improvements compared to the previous versions of this system. The system was made in the client-server technology. Currently, the RSDS contains over 38 540 entries from nearly 42 860 authors. This system works on any computer connected to the Internet and is available at http://rsds.ur.edu.pl.

Zbigniew Suraj, Piotr Grochowalski

Many-Valued Dynamic Object-Oriented Inheritance and Approximations

Abstract

The majority of contemporary software systems are developed using object-oriented tools and methodologies, where constructs like classes, inheritance and objects are first-class citizens. In the current paper we provide a novel formal framework for many-valued object-oriented inheritance in rule-based query languages. We also relate the framework to rough set-like approximate reasoning. Rough sets and their generalizations have intensively been studied and applied. However, the mainstream of the area mainly focuses on the context of information and decision tables. Therefore, approximations defined in the much richer object-oriented contexts generalize known approaches.

Andrzej Szałas

Related Methods and Hybridization

Frontmatter

Minimizing Depth of Decision Trees with Hypotheses

Abstract

In this paper, we consider decision trees that use both conventional queries based on one attribute each and queries based on hypotheses about values of all attributes. Such decision trees are similar to ones studied in exact learning, where membership and equivalence queries are allowed. We present dynamic programming algorithms for minimization of the depth of above decision trees and discuss results of computer experiments on various data sets and randomly generated Boolean functions.

Mohammad Azad, Igor Chikalov, Shahid Hussain, Mikhail Moshkov

The Influence of Fuzzy Expectations on Triples of Triangular Norms in the Weighted Fuzzy Petri Net for the Subject Area of Passenger Transport Logistics

Abstract

This paper continues the analysis of the application of different triples of t-/s-norms and their results in the weighted fuzzy Petri nets for the subject area of passenger transport logistics. The analysis applies the range of 27 different triples of functions which are located in-between minimal (LtN, LtN, ZsN) and maximal (optimized) (ZtN, ZtN, LsN) triples. It also includes classical triple (ZtN, GtN, ZsN) which is located exactly in the middle of this range and remains a good starting point in the comparison of the achieved results. This paper includes a deeper look on the already achieved numerical values as well as decisions and proposes a new approach which will unleash the full potential of the net and applied triples of functions. The idea includes the conception of application of user’s expectation. Therefore, the decision-support system provides the results based not only on the input values which were previously filled by the experts in the corresponding subject area, but also on the expectations which can be either met or rejected in the process of calculation.

Yurii Bloshko, Zbigniew Suraj, Oksana Olar

Possibility Distributions Generated by Intuitionistic -Fuzzy Sets

Abstract

In this work, we bridge possibility theory with intuitionistic L-fuzzy sets, by identifying a special class of possibility distributions corresponding to intuitionistic \(\textsf {L}\)-fuzzy sets based on a complete residuated lattice with an involution. Moreover, taking the \(\L \)ukasiewicz n-chains as structures of truth degrees, we propose an algorithm to compute the intuitionistic \(\textsf {L}\)-fuzzy set corresponding to a given possibility distribution, in case it exists.

Stefania Boffa, Davide Ciucci

Feature Selection and Disambiguation in Learning from Fuzzy Labels Using Rough Sets

Abstract

In this article, we study the setting of learning from fuzzy labels, a generalization of supervised learning in which instances are assumed to be labeled with a fuzzy set, interpreted as an epistemic possibility distribution. We tackle the problem of feature selection in such task, in the context of rough set theory (RST). More specifically, we consider the problem of RST-based feature selection as a means for data disambiguation: that is, retrieving the most plausible precise instantiation of the imprecise training data. We define generalizations of decision tables and reducts, using tools from generalized information theory and belief function theory. We study the computational complexity and theoretical properties of the associated computational problems.

Andrea Campagner, Davide Ciucci

Right Adjoint Algebras Versus Operator Left Residuated Posets

Abstract

Algebraic structures are essential in fuzzy frameworks such as fuzzy formal concept analysis and fuzzy rough set theory. This paper studies two general structures such as right adjoint algebras and operator left residuated posets, introducing several properties which relate them. Different extensions of the operators included in a given operator left residuated poset are presented and a reasoned analysis is shown to guarantee that the equivalence satisfied by the operators in this structure is not a generalization of the usual adjoint property, which is a basic property verified by right adjoint pairs. Operator left residuated posets are also studied in the framework of the Dedekind-MacNeille completion of a poset.

M. Eugenia Cornejo, Jesús Medina

Adapting Fuzzy Rough Sets for Classification with Missing Values

Abstract

We propose an adaptation of fuzzy rough sets to model concepts in datasets with missing values. Upper and lower approximations are replaced by interval-valued fuzzy sets that express the uncertainty caused by incomplete information. Each of these interval-valued fuzzy sets is delineated by a pair of optimistic and pessimistic approximations. We show how this can be used to adapt Fuzzy Rough Nearest Neighbour (FRNN) classification to datasets with missing values. In a small experiment with real-world data, our proposal outperforms simple imputation with the mean and mode on datasets with a low missing value rate.

Oliver Urs Lenz, Daniel Peralta, Chris Cornelis

Areas of Applications

Frontmatter

Spark Accelerated Implementation of Parallel Attribute Reduction from Incomplete Data

Abstract

Attribute reduction is a significant process of data preprocessing to overcome the challenges posed by multidimensional data analysis. Missing values in the data are usually unavoidable in the real applications, so it is important to select features with high importance efficiently in incomplete data. The theory of rough sets has been widely used in attribute reduction for uncertain data mining. To enable the rough set theory for large-scale incomplete data analysis, this paper develops a novel distributed attribute reduction algorithm based on Apache Spark cluster computing platform. By taking the advantage of positive approximation technique for reducing the data broadcasting gradually while reducing each redundant attribute iteratively, the proposed algorithm can significantly accelerate the attribute reduction in leveraging a computer cluster when processing large-scale incomplete data. Numerical experiments on different UCI data sets evidences the proposed parallel algorithm achieves high performance in terms of extensibility and scalability.

Qian Cao, Chuan Luo, Tianrui Li, Hongmei Chen

Attention Enhanced Hierarchical Feature Representation for Three-Way Decision Boundary Processing

Abstract

For binary classification, the three-way decision divides samples into positive (POS) region, negative (NEG) region, and boundary region (BND). The correct division of these boundary data is helpful to improve the accuracy of binary classification. However, how to construct the optimal feature representation from certain samples for boundary domain partition is a challenge. In this paper, we propose attention enhanced hierarchical feature representation for three-way decision boundary processing (AHT) to deal with the boundary region. Based on the three-way decision, certain regions (positive, negative) and boundary regions are obtained. Obtaining the hierarchical feature representations on the positive domain and the negative domain respectively. Constructing attention-enhanced fusion feature representation to guide the boundary domain division of the testing set. The experimental results on five UCI datasets show that our algorithm effectively improves binary classification accuracy.

Jie Chen, Yue Chen, Yang Xu, Shu Zhao, Yanping Zhang

An Opinion Summarization-Evaluation System Based on Pre-trained Models

Abstract

As social media appeal more frequently used, the task of extracting the mainstream opinions of the discussions arising from the media, i. e. opinion summarization, has drawn considerable attention. This paper proposes an opinion summarization-evaluation system containing a pipeline and an evaluation module for the task. In our algorithm, the state-of-the-art pre-trained model BERT is fine-tuned for the subjectivity analysis, and the advanced pre-trained models are combined with traditional data mining algorithms to gain the mainstreams. For evaluation, a set of hierarchical metrics is also stated. Experiment result shows that our algorithm produces concise and major opinions. An ablation study is also conducted to prove that each part of the pipeline takes effect significantly.

Han Jiang, Yubin Wang, Songhao Lv, Zhihua Wei

Fuzzy-Rough Nearest Neighbour Approaches for Emotion Detection in Tweets

Abstract

Social media are an essential source of meaningful data that can be used in different tasks such as sentiment analysis and emotion recognition. Mostly, these tasks are solved with deep learning methods. Due to the fuzzy nature of textual data, we consider using classification methods based on fuzzy rough sets.

Specifically, we develop an approach for the SemEval-2018 emotion detection task, based on the fuzzy rough nearest neighbour (FRNN) classifier enhanced with ordered weighted average (OWA) operators. We use tuned ensembles of FRNN–OWA models based on different text embedding methods. Our results are competitive with the best SemEval solutions based on more complicated deep learning methods.

Olha Kaminska, Chris Cornelis, Veronique Hoste

Three-Way Decisions Based RNN Models for Sentiment Classification

Abstract

Recurrent neural networks (RNN) has been widely used in sentiment classification. RNN can memorize the previous information and is applied to calculate the current output. For sentiment binary classification, RNN calculates the probabilities and then performs binary classification according to the probability values, and some emotions near the median are forcibly divided. But, it does not consider the existence of some samples that are not very clearly polarized in sentiment binary classification. Three-way decisions theory divides the dataset into three regions, positive region, negative region, boundary region. In the process of training classification, the probabilities of some samples belonging to different categories are very close, and three-way decisions can divide them into the boundary region by setting thresholds. Reasonable processing of the boundary region can get better results for binary classification by adjusting the probability of samples in the boundary region. Therefore, in this paper, we propose three-way decisions based RNN models for sentiment classification. Firstly, we use basic RNN models to classify the data. Secondly, we apply three-way decisions theory to set the thresholds, divide the boundary region based on probability. Finally, the probabilities of samples in the boundary region are adjusted and applied in the next round of training. Experiments on four real datasets show that our proposed models are better than corresponding basic RNN models in terms of classification accuracy.

Yan Ma, Jingying Yu, Bojing Ji, Jie Chen, Shu Zhao, Jiajun Chen

Tolerance-Based Short Text Sentiment Classifier

Abstract

Sentiment classification identifies the polarity of text such as positive, negative or neutral based on textual features. A tolerance near set-based text classifier (TSC) is introduced in this paper to classify sentiment polarities of text with vectors from a pre-trained SBERT algorithm. One of the datasets (Covid-Sentiment) was hand-crafted with tweets from Twitter of opinions related to COVID. Experiments demonstrate that TSC outperforms five classical ML algorithms with one dataset, and is comparable with all other datasets using a weighted F1-score.

Vrushang Patel, Sheela Ramanna

Knowledge Graph Representation Learning for Link Prediction with Three-Way Decisions

Abstract

Relation prediction is one of the important tasks of knowledge graph completion, which aims to predict the missing links between entities. Although many different methods have been proposed, most of them usually follow the closed-world assumption. Specifically, these methods simply treat the unknown triples as errors, which will result in the loss of valuable information contained in the knowledge graphs (KGs). In addition, KGs exist large amounts of long-tail relations, which lack sufficient triples for training, and these relations will seriously affect inference performance. In order to address above-mentioned problems, we propose a novel relation prediction method based on three-way decisions, namely RP-TWD. In this paper, RP-TWD model first obtains the similarity between relations by K-Nearest Neighbors (KNN) to model the semantic associations between them. The semantic association between relations can be considered as supplementary information of long-tail relations, and constrain the learning of KG embeddings. Then, based on the idea of three-way decisions (TWD), the triples of specific relation are further divided into three disjoint regions, namely positive region (POS), boundary region (BND), and negative region (NEG). The introduction of BND intends to represent the uncertainty information contained in unknown triples. The experimental results show that our model has significant advantages in the task of relation prediction compared with baselines.

Zhihan Peng, Hong Yu

PNeS in Modelling, Control and Analysis of Concurrent Systems

Abstract

The paper describes the extended and improved version of the Petri Net System (PNeS) compared to the version published in 2017. PNeS is an integrated graphical computer tool for building, modifying, analyzing Petri nets, as well as controlling a mobile robot. It runs on any computer under any operating system. PNeS can be useful for researchers, educators and practitioners, from both academia and industry, who are actively involved in the work of modelling and analyzing concurrent systems, and for those who have the potential to be involved in these areas.

Zbigniew Suraj, Piotr Grochowalski

3RD: A Multi-criteria Decision-Making Method Based on Three-Way Rankings

Abstract

By combining ideas from three-way decision theory, prospect theory, and several families of multi-criteria decision-making (MCDM) methods, including ELECTRE, PROMETHEE, TODIM, and dominance-based rough set analysis (DRSA), we propose a new ranking-based MCDM method called 3RD. With respect to a single criterion, we construct a three-way ranking (i.e., trilevel ranking) of a set of decision alternatives by using an alternative as a reference in the sense of prospect theory and a family of three-way rankings from all alternatives. With respect to a set of criteria, we have multiple families of three-way rankings. By adopting the TODIM procedure, we introduce a ranking function to rank the set of alternatives according to these multiple families of trilevel rankings.

Yiyu Yao, Chengjun Shi

Backmatter

Titel: Rough Sets
herausgegeben von: Sheela Ramanna
Chris Cornelis
Davide Ciucci
Verlag: Springer International Publishing
Electronic ISBN: 978-3-030-87334-9
Print ISBN: 978-3-030-87333-2
DOI: https://doi.org/10.1007/978-3-030-87334-9