Fuzzy Methods in Data Mining and Knowledge Discovery

Frontmatter

Fuzzy Analysis of Sentiment Terms for Topic Detection Process in Social Networks

The aim of this paper is to analyze the influence of sentiment-related terms on the automatic detection of topics in social networks. The study is based on the use of an ontology, to which the capacity to gradually identify and discard sentiment terms in social network texts is incorporated, as these terms do not provide useful information for detecting topics. To detect these terms, we have used two resources focused on the analysis of sentiments. The proposed system has been assessed with real data sets of the social networks Twitter and Dreamcatcher in English and Spanish respectively.

Karel Gutiérrez-Batista, Jesús R. Campaña, Maria-Amparo Vila, Maria J. Martin-Bautista

Fuzzy Association Rules Mining Using Spark

Discovering new trends and co-occurrences in massive data is a key step when analysing social media, data coming from sensors, etc. Traditional Data Mining techniques are not able, in many occasions, to handle such amount of data. For this reason, some approaches have arisen in the last decade to develop parallel and distributed versions of previously known techniques. Frequent itemset mining is not an exception and in the literature there exist several proposals using not only parallel approximations but also Spark and Hadoop developments following the MapReduce philosophy of Big Data.When processing fuzzy data sets or extracting fuzzy associations from crisp data the implementation of such Big Data solutions becomes crucial, since available algorithms increase their execution time and memory consumption due to the problem of not having Boolean items. In this paper, we first review existing parallel and distributed algorithms for frequent itemset and association rule mining in the crisp and fuzzy case, and afterwards we develop a preliminary proposal for mining not only frequent fuzzy itemsets but also fuzzy association rules. We also study the performance of the proposed algorithm in several datasets that have been conveniently fuzzyfied obtaining promising results.

Carlos Fernandez-Bassso, M. Dolores Ruiz, Maria J. Martin-Bautista

A Typology of Data Anomalies

Anomalies are cases that are in some way unusual and do not appear to fit the general patterns present in the dataset. Several conceptualizations exist to distinguish between different types of anomalies. However, these are either too specific to be generally applicable or so abstract that they neither provide concrete insight into the nature of anomaly types nor facilitate the functional evaluation of anomaly detection algorithms. With the recent criticism on ‘black box’ algorithms and analytics it has become clear that this is an undesirable situation. This paper therefore introduces a general typology of anomalies that offers a clear and tangible definition of the different types of anomalies in datasets. The typology also facilitates the evaluation of the functional capabilities of anomaly detection algorithms and as a framework assists in analyzing the conceptual levels of data, patterns and anomalies. Finally, it serves as an analytical tool for studying anomaly types from other typologies.

Ralph Foorthuis

IF-CLARANS: Intuitionistic Fuzzy Algorithm for Big Data Clustering

Clustering method is one of the most important and basic technique for data mining which aims to group a collection of samples into clusters based on similarity. Clustering Big datasets has always been a serious challenge due to its high dimensionality and complexity. In this paper, we propose a novel clustering algorithm which aims to introduce the concept of intuitionistic fuzzy set theory onto the framework of CLARANS for handling uncertainty in the context of mining Big datasets. We also suggest a new scalable approximation to compute the maximum number of neighbors. Our experimental evaluation on real data sets shows that the proposed algorithm can obtain satisfactory clustering results and outperforms other current methods. The clusters quality was evaluated by three well-known metrics.

Hechmi Shili, Lotfi Ben Romdhane

Semi-supervised Fuzzy c-Means Variants: A Study on Noisy Label Supervision

Semi-supervised clustering algorithms aim at discovering the hidden structure of data sets with the help of expert knowledge, generally expressed as constraints on the data such as class labels or pairwise relations. Most of the time, the expert is considered as an oracle that only provides correct constraints. This paper focuses on the case where some label constraints are erroneous and proposes to investigate into more detail three semi-supervised fuzzy c-means clustering approaches as they have been tailored to naturally handle uncertainty in the expert labeling. In order to run a fair comparison between existing algorithms, formal improvements have been proposed to guarantee and fasten their convergence. Experiments conducted on real and synthetical datasets under uncertain labels and noise in the constraints show the effectiveness of using fuzzy clustering algorithm for noisy semi-supervised clustering.

Violaine Antoine, Nicolas Labroche

Towards a Hierarchical Extension of Contextual Bipolar Queries

We are concerned with the bipolar database queries in which the query is composed of a necessary (required) and optional (desired) part connected with a non-conventional aggregation operator “and possibly”, combined with context, as, for instance, in the query “find houses which are cheap and – with respect to other houses in town – possibly close to a railroad station”. We deal with a multivalued logic based interpretation of bipolar queries. We assume that the human user, usually a database novice, tends to use general terms in the queries in natural language, which do not directly relate to attributes, and via a question and answer process these terms are “decoded” using a concept hierarchy that at the end involves terms directly related to attribute values. We propose a novel extension of our contextual hierarchical bipolar database query in which the original query is considered a level 0 query at the bottom of the precisiation hierarchy, then its required and optional parts are assumed to be bipolar queries themselves, with an account of context. This makes it possible to further precisiate the user’s intentions/preferences. A level 1 of precisiation is obtained, and the process is continued so far as it is necessary for the user to adequately reflect his/her intentions/preferences as to what is sought. The new concept is demonstrated on an intuitively appealing real estate example which will serve the role of both an illustration of the idea of our approach and of a real example.

Janusz Kacprzyk, Sławomir Zadrożny

Towards an App Based on FIWARE Architecture and Data Mining with Imperfect Data

In this work, the structure for the prototype construction of an application that can be framed within ubiquitous sensing is proposed. The objective of application is to allow that a user knows through his mobile device which other users of his environment are doing the same activity as him. Therefore, the knowledge is obtained from data acquired by pervasive sensors. The FIWARE infrastructure is used to allow to homogenize the data flows.An important element of the application is the Intelligent Data Analysis module where, within the Apache Storm technology, a Data Mining technique will be used. This module identifies the activity carried out by mobile device user based on the values obtained by the different sensors of the device.The Data Mining technique used in this module is an extension of the Nearest Neighbors technique. This extension allows the imperfect data processing, and therefore, the effort that must be made in the data preprocessing to obtain the minable view of data is reduced. It also allows us to parallelize part of the process by using the Apache Storm technology.

Jose M. Cadenas, M. Carmen Garrido, Cristina Villa

A Fuzzy Close Algorithm for Mining Fuzzy Association Rules

Association rules allow to mine large datasets to automatically discover relations between variables. In order to take into account both qualitative and quantitative variables, fuzzy logic has been applied and many association rule extraction algorithms have been fuzzified.In this paper, we propose a fuzzy adaptation of the well-known Close algorithm which relies on the closure of itemsets. The Close-algorithm needs less passes over the dataset and is suitable when variables are correlated. The algorithm is then compared to other on public datasets.

Régis Pierrard, Jean-Philippe Poli, Céline Hudelot

Datil: Learning Fuzzy Ontology Datatypes

Real-world applications using fuzzy ontologies are increasing in the last years, but the problem of fuzzy ontology learning has not received a lot of attention. While most of the previous approaches focus on the problem of learning fuzzy subclass axioms, we focus on learning fuzzy datatypes. In particular, we describe the Datil system, an implementation using unsupervised clustering algorithms to automatically obtain fuzzy datatypes from different input formats. We also illustrate the practical usefulness with an application: semantic lifestyle profiling.

Ignacio Huitzil, Umberto Straccia, Natalia Díaz-Rodríguez, Fernando Bobillo

Fuzzy Transforms: Theory and Applications to Data Analysis and Image Processing

Frontmatter

Axiomatic of Inverse Lattice-Valued F-transform

Axioms of two versions of inverse fuzzy transformation systems are introduced, and it is proved that a transformation function satisfies these axioms if and only if it is an upper or lower inverse lattice-valued F-transform with respect to a fuzzy partition. Categories of inverse transformation systems are introduced, and it is proved that these categories are isomorphic to the category of spaces with fuzzy partitions.

Jiří Močkoř

Why Triangular Membership Functions are Often Efficient in F-transform Applications: Relation to Probabilistic and Interval Uncertainty and to Haar Wavelets

Fuzzy techniques describe expert opinions. At first glance, we would therefore expect that the more accurately the corresponding membership functions describe the expert’s opinions, the better the corresponding results. In practice, however, contrary to these expectations, the simplest – and not very accurate – triangular membership functions often work the best. In this paper, on the example of the use of membership functions in F-transform techniques, we provide a possible theoretical explanation for this surprising empirical phenomenon.

Olga Kosheleva, Vladik Kreinovich

Enhanced F-transform Exemplar Based Image Inpainting

This paper focuses on a completion of the partially damaged image. There are a variety of techniques to deal with this task. Our contribution belongs to the group of exemplar based image inpainting techniques which process the image what was separated to the many small regions. The regions are called patches and the task of inpainting becomes the task of searching for the most suitable patch from the undamaged part of the image to replace the partially damaged one. Our novelty is in processing based on fuzzy mathematics and new filling order prioritization function.

Pavel Vlašánek

A Novel Approach to the Discrete Fuzzy Transform of Higher Degree

In this paper, we propose a new approach to the discrete fuzzy transform of higher degree based on the piecewise constant representation of discrete functions and the application of the continuous fuzzy transform. We show how a given discrete function can be reconstructed by using the discrete higher degree fuzzy transform and how convenient the latter is computed by the novel approach. Finally, we illustrate and compare the proposed technique with the original discrete fuzzy transform of higher degree.

Linh Nguyen, Michal Holčapek

Lattice-Valued F-Transforms as Interior Operators of L-Fuzzy Pretopological Spaces

The focus is on two spaces with a weaker structure than that of a fuzzy topology. The first one is a fuzzy pretopological space, and the second one is a space with an L-fuzzy partition. For a fuzzy pretopological space, we prove that it can be determined by a Čech interior operator and that the latter can be represented by a reflexive fuzzy relation. For a space with an L-fuzzy partition, we show that a lattice-valued $$F^{\downarrow }$$F↓-transform is a strong Čech-Alexandrov fuzzy interior operator. Conversely, we found conditions that guarantee that a given L-fuzzy pretopology determines the L-fuzzy partition and the corresponding $$F^{\downarrow }$$F↓-transform operator.

Irina Perfilieva, S. P. Tiwari, Anand P. Singh

Modified F-transform Based on B-splines

The aim of this paper is to improve the F-transform technique based on B-splines. A modification of the F-transform of higher degree with respect to fuzzy partitions based on B-splines is done to extend the good approximation properties from the interval where the Ruspini condition is fulfilled to the whole interval under consideration. The effect of the proposed modification is characterized theoretically and illustrated numerically.

Martins Kokainis, Svetlana Asmuss

Collocation Method for Linear BVPs via B-spline Based Fuzzy Transform

The paper is devoted to an application of a modified F-transform technique based on B-splines in solving linear boundary value problems via the collocation method. An approximate solution is sought as a composite F-transform of a discrete function (which allows the solution to be compactly stored as the values of this discrete function). We demonstrate the effectiveness of the described technique with numerical examples, compare it with other methods and propose theoretical results on the order of approximation when the fuzzy partition is based on cubic B-splines.

Martins Kokainis, Svetlana Asmuss

Imprecise Probabilities: Foundations and Applications

Frontmatter

Natural Extension of Choice Functions

We extend the notion of natural extension, that gives the least committal extension of a given assessment, from the theory of sets of desirable gambles to that of choice functions. We give an expression of this natural extension and characterise its existence by means of a property called avoiding complete rejection. We prove that our notion reduces indeed to the standard one in the case of choice functions determined by binary comparisons, and that these are not general enough to determine all coherent choice function. Finally, we investigate the compatibility of the notion of natural extension with the structural assessment of indifference between a set of options.

Arthur Van Camp, Enrique Miranda, Gert de Cooman

Approximations of Coherent Lower Probabilities by 2-monotone Capacities

We investigate the problem of approximating a coherent lower probability on a finite space by a 2-monotone capacity that is at the same time as close as possible while not including additional information. We show that this can be tackled by means of a linear programming problem, and investigate the features of the set of undominated solutions. While our approach is based on a distance proposed by Baroni and Vicig, we also discuss a number of alternatives. Finally, we show that our work applies to the more general problem of approximating coherent lower previsions.

Ignacio Montes, Enrique Miranda, Paolo Vicig

Web Apps and Imprecise Probabilitites

We propose a model for the behaviour of Web apps in the unreliable WWW. Web apps are described by orchestrations. An orchestration mimics the personal use of the Web by defining the way in which Web services are invoked. The WWW is unreliable as poorly maintained Web sites are prone to fail. We model this source of unreliability trough a probabilistic approach. We assume that each site has a probability to fail. Another source of uncertainty is the traffic congestion. This can be observed as a non-deterministic behaviour induced by the variability in the response times. We model non-determinism by imprecise probabilities. We develop here an ex-ante normal to characterize the behaviour of finite orchestrations in the unreliable Web. We show the existence of a normal form under such semantics for orchestrations using asymmetric parallelism.

Jorge Castro, Joaquim Gabarro, Maria Serna

Conditional Submodular Coherent Risk Measures

A family of conditional risk measures is introduced by considering a single period financial market, relying on a notion of conditioning for submodular capacities, which generalizes that introduced by Dempster. The resulting measures are expressed as discounted conditional Choquet expected values, take into account ambiguity towards uncertainty and allow for conditioning to “null” events. We also provide a characterisation of consistence of a partial assessment with a conditional submodular coherent risk measure. The latter amounts to test the solvability of a suitable sequence of linear systems.

Giulianella Coletti, Davide Petturiti, Barbara Vantaggi

Mathematical Fuzzy Logic and Mathematical Morphology

Frontmatter

On the Structure of Group-Like FL-chains

Hahn’s celebrated embedding theorem asserts that linearly ordered Abelian groups embed in the lexicographic product of real groups [13]. In this paper the partial-lexicographic product construction is introduced, a class of residuated monoids, namely, group-like FL$$_e$$-chains which possess finitely many idempotents are embedded into finite partial-lexicographic products of linearly ordered Abelian groups, that is, Hahn’s theorem is extended to this residuated monoid class. As a side-result, the finite strong standard completeness of the logic $$\mathbf{IUL}^{fp}$$ is announced.

Sándor Jenei

Logics for Strict Coherence and Carnap-Regular Probability Functions

In this paper we provide a characterization of strict coherence in terms of the logical consistency of suitably defined formulas in fuzzy-modal logics for probabilistic reasoning. As a direct consequence of our characterization, we also show the decidability for the problem of checking the strict coherence of rational-valued books on classical events. Further, we introduce a fuzzy modal logic that captures Carnap-regular probability functions, that is normalized and finitely additive measures which maps to 0 only the impossible event.

Tommaso Flaminio

Connecting Systems of Mathematical Fuzzy Logic with Fuzzy Concept Lattices

In this paper our aim is to explore a new look at formal systems of fuzzy logics using the framework of (fuzzy) formal concept analysis (FCA). Let L be an extension of MTL complete with respect to a given L-chain. We investigate two possible approaches. The first one is to consider fuzzy formal contexts arising from L where attributes are identified with L-formulas and objects with L-evaluations: every L-evaluation (object) satisfies a formula (attribute) to a given degree, and vice-versa. The corresponding fuzzy concept lattices are shown to be isomorphic to quotients of the Lindenbaum algebra of L. The second one, following an idea in a previous paper by two of the authors for the particular case of Gödel fuzzy logic, is to use a result by Ganter and Wille in order to interpret the (lattice reduct of the) Lindenbaum algebra of L-formulas as a (classical) concept lattice of a given context.

Pietro Codara, Francesc Esteva, Lluis Godo, Diego Valota

Spatio-Temporal Drought Identification Through Mathematical Morphology

Droughts are initiated by a lack of precipitation over a large area and a long period of time. In order to be able to estimate the possible impacts of droughts, it is important to identify and characterise these events. Describing a drought is, however, not such an easy task as it represents a spatio-temporal phenomenon, with no clear start and ending, trailing from one place to another. This study tries to objectively identify droughts in space and time by applying operators from mathematical morphology. On the basis of the identified droughts, OWA operators are employed to characterise the events in order to aid farmers, water managers, etc. in coping with these events.

Hilde Vernieuwe, Bernard De Baets, Niko E. C. Verhoest

Measures of Comparison and Entropies for Fuzzy Sets and Their Extensions

Frontmatter

On Dissimilarity Measures at the Fuzzy Partition Level

On the one hand, a user vocabulary is often used by soft-computing-based approaches to generate a linguistic and subjective description of numerical and categorical data. On the other hand, knowledge extraction strategies (as e.g. association rules discovery or clustering) may be applied to help the user understand the inner structure of the data. To apply knowledge extraction techniques on subjective and linguistic rewritings of the data, one first has to address the question of defining a dedicated distance metric. Many knowledge extraction techniques indeed rely on the use of a distance metric, whose properties have a strong impact on the relevance of the extracted knowledge. In this paper, we propose a measure that computes the dissimilarity between two items rewritten according to a user vocabulary.

Grégory Smits, Olivier Pivert, Toan Ngoc Duong

Monotonicity of a Profile of Rankings with Ties

A common problem in social choice theory concerns the aggregation of the rankings expressed by several voters. Two different settings are often discussed depending on whether the aggregate is assumed to be a latent true ranking that voters try to identify or a compromise ranking that (partially) satisfies most of the voters. In a previous work, we introduced the notion of monotonicity of a profile of rankings and used it for statistically testing the existence of this latent true ranking. In this paper, we consider different extensions of this property to the case in which voters provide rankings with ties.

Raúl Pérez-Fernández, Irene Díaz, Susana Montes, Bernard De Baets

Consistency Properties for Fuzzy Choice Functions: An Analysis with the Łukasiewicz t-norm

In continuation of the research in Alcantud and Díaz [1], we investigate the relationships between consistency axioms in the framework of fuzzy choice functions. In order to help disclose the role of a t-norm in such analyses, we start to study the situation that arises when we use other t-norms instead. We conclude that unless we impose further structure on the domain of application for the choices, the use of the Łukasiewicz t-norm as a replacement for the minimum t-norm does not guarantee a better performance.

Susana Díaz, José Carlos R. Alcantud, Susana Montes

Entropy and Monotonicity

Measuring the information provided by the observation of events has been a challenge for seventy years, since the simultaneous inception of entropy by Claude Shannon and Norbert Wiener in 1948. Various definitions have been proposed, depending on the context, the point of view and the chosen knowledge representation. We show here that one of the most important common feature in the choice of an entropy is its behavior with regard to the refinement of information and we analyse various definitions of monotonicity.

Bernadette Bouchon-Meunier, Christophe Marsala

On the Problem of Comparing Ordered Ordinary Fuzzy Multisets

In this work we deal with a particular type of hesitant fuzzy set, in the case where membership values can appear multiple times and are ordered. They are called ordered ordinary fuzzy multisets. Some operations between them are introduced by means of an extension principle. In particular, the divergence measures between two of these multisets are defined and we have studied in detail the local family of divergences. Finally, these measures are related to the ones given for ordinary fuzzy sets.

Ángel Riesgo, Pedro Alonso, Irene Díaz, Vladimír Janiš, Vladimír Kobza, Susana Montes

New Trends in Data Aggregation

Frontmatter

The Median Procedure as an Example of Penalty-Based Aggregation of Binary Relations

The aggregation of binary relations is a common topic in many fields of application such as social choice and cluster analysis. In this paper, we discuss how the median procedure – probably the most common method for aggregating binary relations – fits in the framework of penalty-based data aggregation.

Raúl Pérez-Fernández, Bernard De Baets

Least Median of Squares (LMS) and Least Trimmed Squares (LTS) Fitting for the Weighted Arithmetic Mean

We look at different approaches to learning the weights of the weighted arithmetic mean such that the median residual or sum of the smallest half of squared residuals is minimized. The more general problem of multivariate regression has been well studied in statistical literature, however in the case of aggregation functions we have the restriction on the weights and the domain is also usually restricted so that ‘outliers’ may not be arbitrarily large. A number of algorithms are compared in terms of accuracy and speed. Our results can be extended to other aggregation functions.

Gleb Beliakov, Marek Gagolewski, Simon James

Combining Absolute and Relative Information in Studies on Food Quality

A common problem in food science concerns the assessment of the quality of food samples. Typically, a group of panellists is trained exhaustively on how to identify different quality indicators in order to provide absolute information, in the form of scores, for each given food sample. Unfortunately, this training is expensive and time-consuming. For this very reason, it is quite common to search for additional information provided by untrained panellists. However, untrained panellists usually provide relative information, in the form of rankings, for the food samples. In this paper, we discuss how both scores and rankings can be combined in order to improve the quality of the assessment.

Marc Sader, Raúl Pérez-Fernández, Bernard De Baets

Twofold Binary Image Consensus for Medical Imaging Meta-Analysis

In the field of medical imaging, ground truth is often gathered from groups of experts, whose outputs are generally heterogeneous. This procedure raises questions on how to compare the results obtained by automatic algorithms to multiple ground truth items. Secondarily, it raises questions on the meaning of the divergences between experts. In this work, we focus on the case of immunohistochemistry image segmentation and analysis. We propose measures to quantify the divergence in groups of ground truth images, and we observe their behaviour. These measures are based upon fusion techniques for binary images, which is a common example of non-monotone data fusion process. Our measures can be used not only in this specific field of medical imagery, but also in any task related to meta-quality evaluation for image processing, e.g. ground truth validation or expert rating.

Carlos Lopez-Molina, Javier Sanchez Ruiz de Gordoa, Victoria Zelaya-Huerta, Bernard De Baets

Pre-aggregation Functions and Generalized Forms of Monotonicity

Frontmatter

Penalty-Based Functions Defined by Pre-aggregation Functions

Pre-aggregation function (PAF) is an important concept that has emerged in the context of directional monotonicity functions. Such functions satisfy the same boundary conditions of an aggregation functions, but it is not required the monotone increasingness in all the domain, just in some fixed directions. On the other hand, penalty functions is another important concept for decision making applications, since they can provide a measure of deviation from the consensus value given by averaging aggregation functions, or a penalty for not having such consensus. This paper studies penalty-based functions defined by PAFs. We analyse some properties (e.g.: idempotency, averaging behavior and shift-invariance), providing a characterization of idempotent penalty-based PAFs and a weak characterization of averaging penalty-based PAFs. The use of penalty-based PAFs in spatial/tonal filters is outlined.

Graçaliz Pereira Dimuro, Radko Mesiar, Humberto Bustince, Benjamín Bedregal, José Antonio Sanz, Giancarlo Lucca

Strengthened Ordered Directional and Other Generalizations of Monotonicity for Aggregation Functions

A tendency in the theory of aggregation functions is the generalization of the monotonicity condition. In this work, we examine the latest developments in terms of different generalizations. In particular, we discuss strengthened ordered directional monotonicity, its relation to other types of monotonicity, such as directional and ordered directional monotonicity and the main properties of the class of functions that are strengthened ordered directionally monotone. We also study some construction methods for such functions and provide a characterization of usual monotonicity in terms of these notions of monotonicity.

Mikel Sesma-Sara, Laura De Miguel, Julio Lafuente, Edurne Barrenechea, Radko Mesiar, Humberto Bustince

A Study of Different Families of Fusion Functions for Combining Classifiers in the One-vs-One Strategy

In this work we study the usage of different families of fusion functions for combining classifiers in a multiple classifier system of One-vs-One (OVO) classifiers. OVO is a decomposition strategy used to deal with multi-class classification problems, where the original multi-class problem is divided into as many problems as pair of classes. In a multiple classifier system, classifiers coming from different paradigms such as support vector machines, rule induction algorithms or decision trees are combined. In the literature, several works have addressed the usage of classifier selection methods for these kinds of systems, where the best classifier for each pair of classes is selected. In this work, we look at the problem from a different perspective aiming at analyzing the behavior of different families of fusion functions to combine the classifiers. In fact, a multiple classifier system of OVO classifiers can be seen as a multi-expert decision making problem. In this context, for the fusion functions depending on weights or fuzzy measures, we propose to obtain these parameters from data. Backed-up by a thorough experimental analysis we show that the fusion function to be considered is a key factor in the system. Moreover, those based on weights or fuzzy measures can allow one to better model the aggregation problem.

Mikel Uriz, Daniel Paternain, Aranzazu Jurio, Humberto Bustince, Mikel Galar

Rough and Fuzzy Similarity Modelling Tools

Frontmatter

Object [Re]Cognition with Similarity

We discuss the origin of the notion of similarity, basic concepts connected with it and some methods of representing this conception in mathematical setting. We present a framework of recognition that is based on multi-aspects similarity. The framework is implemented in form of a network of comparators, that processes similarity expressed in terms of fuzzy sets. Our approach introduces a new standard to the field of similarity computing and processing.

Łukasz Sosnowski, Julian Skirzyński

Attribute Reduction of Set-Valued Decision Information System

In practice, we may obtain data which is set-valued due to the limitation of acquisition means or the requirement of practical problems. In this paper, we focus on how to reduce set-valued decision information systems under the disjunctive semantics. First, a new relation to measure the degree of similarity between two set-valued objects is defined, which overcomes the limitations of the existing measure methods. Second, an attribute reduction algorithm for set-valued decision information systems is proposed. At last, the experimental results demonstrate that the proposed method can simplify set-valued decision information systems and achieve higher classification accuracy than existing methods.

Jun Hu, Siyu Huang, Rui Shao

Defuzzyfication in Interpretation of Comparator Networks

We present an extension to the methods and algorithms for approximation of similarity known as Networks of Comparators. By interpreting the output of the network in terms of discrete fuzzy set we make it possible to employ various defuzzyfication techniques for the purpose of establishing a unique value of the output of comparator network. We illustrate the advantages of this approach using two examples.

Łukasz Sosnowski, Marcin Szczuka

A Comparison of Characteristic Sets and Generalized Maximal Consistent Blocks in Mining Incomplete Data

We discuss two interpretations of missing attribute values, lost values and “do not care” conditions. Both interpretations may be used for data mining based on characteristic sets. On the other hand, maximal consistent blocks were originally defined for incomplete data sets with “do not care” conditions, using only lower and upper approximations. We extended definitions of maximal consistent blocks to both interpretations while using probabilistic approximations, a generalization of lower and upper approximations. Our main objective is to compare approximations based on characteristic sets with approximations based on maximal consistent blocks in terms of an error rate.

Patrick G. Clark, Cheng Gao, Jerzy W. Grzymala-Busse, Teresa Mroczek

Rules Induced from Rough Sets in Information Tables with Continuous Values

Rule induction based on neighborhood rough sets is described in information tables with continuous values. An indiscernible range that a value has in an attribute is determined by a threshold on that attribute. The indiscernibility relation is derived from using the indiscernible range. First, lower and upper approximations are described in complete information tables by directly using the indiscernibility relation. Rules are obtained from the approximations. To improve the applicability of rules, a series of rules is put into one rule expressed with an interval value, which is called a combined rule. Second, these are addressed in incomplete information tables. Incomplete information is expressed by a set of values or an interval value. The indiscernibility relations are constructed from two viewpoints: certainty and possibility. Consequently, we obtain four types of approximations: certain lower, certain upper, possible lower, and possible upper approximations. Using these approximations, rough sets are expressed by interval sets. From these approximations we obtain four types of combined rules: certain and consistent, certain and inconsistent, possible and consistent, and possible and inconsistent ones. These combined rules have greater applicability than single rules that individual objects support.

Michinori Nakata, Hiroshi Sakai, Keitarou Hara

How to Match Jobs and Candidates - A Recruitment Support System Based on Feature Engineering and Advanced Analytics

We describe a recruitment support system aiming to help recruiters in finding candidates who are likely to be interested in a given job offer. We present the architecture of that system and explain roles of its main modules. We also give examples of analytical processes supported by the system. In the paper, we focus on a data processing chain that utilizes domain knowledge for the extraction of meaningful features representing pairs of candidates and offers. Moreover, we discuss the usage of a word2vec model for finding concise vector representations of the offers, based on their short textual descriptions. Finally, we present results of an empirical evaluation of our system.

Andrzej Janusz, Sebastian Stawicki, Michał Drewniak, Krzysztof Ciebiera, Dominik Ślęzak, Krzysztof Stencel

Similarity-Based Accuracy Measures for Approximate Query Results

We introduce a new approach to empirical evaluation of the accuracy of the select statement results produced by a relational approximate query engine. We emphasize the meaning of a similarity of approximate and exact outcomes of queries from the perspective of practical applicability of approximate query processing solutions. We propose how to design the similarity-based procedure that lets us compare approximate and exact versions of the results of complex queries. We not only offer a measure of the accuracy of query results, but also describe the results of research on users intuition regarding the properties of such a measure, as well as perception query results as similar. The study is supported by theoretical and empirical analyses of different similarity functions and the case study of the investigative analytics over data sets related to network intrusion detection.

Agnieszka Chądzyńska-Krasowska

A Linear Model for Three-Way Analysis of Facial Similarity

Card sorting was used to gather information about facial similarity judgments. A group of raters put a set of facial photos into an unrestricted number of different piles according to each rater’s judgment of similarity. This paper proposes a linear model for 3-way analysis of similarity. An overall rating function is a weighted linear combination of ratings from individual raters. A pair of photos is considered to be similar, dissimilar, or divided, respectively, if the overall rating function is greater than or equal to a certain threshold, is less than or equal to another threshold, or is between the two thresholds. The proposed framework for 3-way analysis of similarity is complementary to studies of similarity based on features of photos.

Daryl H. Hepting, Hadeel Hatim Bin Amer, Yiyu Yao

Empirical Comparison of Distances for Agglomerative Hierarchical Clustering

This paper proposes a method for empirical comparison of distances for agglomerative hierarchical clustering based on rough set-based approximation. When a set of target is given, a level of clustering tree where one branch includes all the targets can be traced with the number of elements included. The pair $$(\#clusters of a level, \#elements of a cluster) $$(#clustersofalevel,#elementsofacluster) can be viewed as indices-pair for a given clustering tree.

Shusaku Tsumoto, Tomohiro Kimura, Haruko Iwata, Shoji Hirano

Soft Computing for Decision Making in Uncertainty

Frontmatter

Missing Data Imputation by LOLIMOT and FSVM/FSVR Algorithms with a Novel Approach: A Comparative Study

Missing values occurrence is an inherent part of collecting data sets in real world’s problems. This issue, causes lots of ambiguities in data analysis while processing data sets. Therefore, implementing methods which can handle missing data issues are critical in many fields, in order to providing accurate, efficient and valid analysis.In this paper, we proposed a novel preprocessing approach that estimates and imputes missing values in datasets by using LOLIMOT and FSVM/FSVR algorithms, which are state-of-the-art algorithms. Classification accuracy, is a scale for comparing precision and efficiency of presented approach with some other well-known methods. Obtained results, show that proposed approach is the most accurate one.

Fatemeh Fazlikhani, Pegah Motakefi, Mir Mohsen Pedram

Two Modifications of the Automatic Rule Base Synthesis for Fuzzy Control and Decision Making Systems

This paper presents two modifications of the method of synthesis and optimization of rule bases (RB) of fuzzy systems (FS) for decision making and control of complex technical objects under conditions of uncertainty. To illustrate the advantages of the proposed method, the development of the RB of Mamdani type fuzzy controller (FC) for the automatic control system (ACS) of the reactor temperature of the experimental specialized pyrolysis plant (SPP) is carried out. The efficiency of the presented method of synthesis and optimization of the FS RB is investigated and its comparison with the other existing methods is carried out on the basis of this FC. Analysis of simulation results confirms the high efficiency of the proposed by the authors method of synthesis and reduction of the FS RB.

Yuriy P. Kondratenko, Oleksiy V. Kozlov, Oleksiy V. Korobko

Decision Making Under Incompleteness Based on Soft Set Theory

Decision making with complete and accurate information is ideal but infrequent. Unfortunately, in most cases the available information is vague, imprecise, uncertain or unknown. The theory of soft sets provides an appropriate framework for decision making that may be used to deal with uncertain decisions. The aim of this paper is to propose and analyze an effective algorithm for multiple attribute decision-making based on soft set theory in an incomplete information environment, when the distribution of incomplete data is unknown. This procedure provides an accurate solution through a combinatorial study of possible cases in the unknown data. Our theoretical development is complemented by practical examples that show the feasibility and implementability of this algorithm. Moreover, we review recent research on decision making from the standpoint of the theory of soft sets under incomplete information.

José Carlos R. Alcantud, Gustavo Santos-García

Intelligent Decision Support System for Selecting the University-Industry Cooperation Model Using Modified Antecedent-Consequent Method

This work is devoted to the analysis and selection of the most rational model of the university/IT-company cooperation (UIC) using intelligent decision support systems (DSSs) in the conditions of input information uncertainty. The modification of a two-cascade method for reconfiguration of the fuzzy DSS’s rule bases is described in details for situations when the volume of input data can be changed. Authors propose an additional observer procedure for checking the fuzzy rule consequents before their final correction. The modified method provides (a) structural reduction of the rule antecedents, (b) correction of the corresponding consequents in an interactive mode and (c) avoiding the results’ deformation in the decision making process with variable structure of input data. Special attention is paid to the hierarchically organized DSSs (with variable input vector and discrete logic output) and to design of the web-oriented instrumental tool (WOTFS-1). The simulation results confirm the efficiency and expediency of using (a) the software WOTFS-1 and (b) modified method of fuzzy rule base’s antecedent-consequent reconfiguration for the efficient selection of the rational model of academia-industry cooperation.

Yuriy P. Kondratenko, Galyna Kondratenko, Ievgen Sidenko

Strategy to Managing Mixed Datasets with Missing Items

The paper refers to the problem of decision making and choosing appropriate ways for decreasing the level of input information uncertainty related to absence or unavailability some values of mixed data sets. Approaches to addressing missing data and evaluating their performance are discussed. The generalized strategy to managing data with missing values is proposed. The study based on real pregnancy-related records of 186 patients from 12 to 42 weeks of gestation. Three missing data techniques: complete ignoring, case deletion, and random forest (RF) missing data imputation were applied to the medical data of various types, under a missing completely at random assumption for solving classification task and softening the negative impact of input information uncertainty. The efficiency of approaches to deal with missingness was evaluated. Results demonstrated that case deletion and ignoring missing values were the less suitable to handle mixed types of missing data and suggested RF imputation as a useful approach for imputing complex pregnancy-related data sets with missing data.

Inna Skarga-Bandurova, Tetiana Biloborodova, Yuriy Dyachenko

Predicting Opponent Moves for Improving Hearthstone AI

Games pose many interesting questions for the development of artificial intelligence agents. Especially popular are methods that guide the decision-making process of an autonomous agent, which is tasked to play a certain game. In previous studies, the heuristic search method Monte Carlo Tree Search (MCTS) was successfully applied to a wide range of games. Results showed that this method can often reach playing capabilities on par with humans or even better. However, the characteristics of collectible card games such as the online game Hearthstone make it infeasible to apply MCTS directly. Uncertainty in the opponent’s hand cards, the card draw, and random card effects considerably restrict the simulation depth of MCTS. We show that knowledge gathered from a database of human replays help to overcome this problem by predicting multiple card distributions. Those predictions can be used to increase the simulation depth of MCTS. For this purpose, we calculate bigram-rates of frequently co-occurring cards to predict multiple sets of hand cards for our opponent. Those predictions can be used to create an ensemble of MCTS agents, which work under the assumption of differing card distributions and perform simulations according to their assigned distribution. The proposed ensemble approach outperforms other agents on the game Hearthstone, including various types of MCTS. Our case study shows that uncertainty can be handled effectively using predictions of sufficient accuracy, ultimately, improving the MCTS guided decision-making process. The resulting decision-making based on such an MCTS ensemble proved to be less prone to errors by uncertainty and opens up a new class of MCTS algorithms.

Alexander Dockhorn, Max Frick, Ünal Akkaya, Rudolf Kruse

A New Generic Framework for Argumentation-Based Negotiation Using Case-Based Reasoning

The growing use of Information Technology in automated negotiation leads to an urgent need to find alternatives to traditional protocols. New tools from fields such as Artificial Intelligence (AI) should be considered in the process of developing novel protocols, in order to make the negotiation process simpler, faster and more realistic. This paper proposes a new framework based on both argumentation and Case-Based Reasoning (CBR) as means of guiding the negotiation process to a settlement. This paper proposes a new generic domain-independent framework that overcomes the limits of domain-dependent frameworks. The proposed framework was tested in tourism domain using real data.

Rihab Bouslama, Raouia Ayachi, Nahla Ben Amor

Soft Computing in Information Retrieval and Sentiment Analysis

Frontmatter

Obtaining WAPO-Structure Through Inverted Indexes

In order to represent texts preserving their semantics, in earlier work we proposed the WAPO-Structure, which is an intermediate form of representation that allows related terms to remain together. This intermediate form can be visualized through a tag cloud, which in turn serves as a textual navigation and retrieval tool. WAPO-Structures were obtained through a modification of the APriori algorithm, which spends a lot of processing time computing frequent sequences, for which it must perform numerous readings on the text until finding the frequent sequences of maximal level.In this paper we present an alternative method for the generation of the WAPO-Structure from the inverted indexes of the text. This method saves processing time in texts for which an inverted index is already computed.

Úrsula Torres-Parejo, Jesús R. Campaña, Maria-Amparo Vila, Miguel Delgado

Automatic Expansion of Spatial Ontologies for Geographic Information Retrieval

One of the most prominent scenarios for capturing implicit knowledge from heterogeneous data sources concerns the geospatial data domain. In this scenario, ontologies play a key role for managing the totality of geospatial concepts, categories and relations at different resolutions. However, the manual development of geographic ontologies implies an exhausting work due to the rapid growth of the data available on the Internet. In order to address this challenge, the present work describes a semi-automatic approach to build and expand a geographic ontology by integrating the information provided by diverse spatial data sources. The generated ontology can be used as a knowledge resource in a Geographic Information Retrieval system. As a main novelty, the use of OWL 2 as an ontology language allowed us to model and infer new spatial relationships, regarding the use of other less expressive languages such as RDF or OWL 1. Two different spatial ontologies were generated for two specific geographic regions by applying the proposed approach, and the evaluation results showed their suitability to be used as geographic-knowledge resources in Geographic Information Retrieval contexts.

Manuel E. Puebla-Martínez, José M. Perea-Ortega, Alfredo Simón-Cuevas, Francisco P. Romero

Using Syntactic Analysis to Enhance Aspect Based Sentiment Analysis

Many companies/corporations are interested in the opinion that users share about them in different social media. Sentiment analysis provides us with a powerful tool to discern the polarity of the opinion about a particular object or service, which makes it an important research field nowadays. In this paper we present a method to perform the sentiment analysis of a sentence through its syntactic analysis, by generating a code in Prolog from the parse tree of the sentence, which is automatically generated using natural language processing tools. This is a preliminary work, which provides encouraging results.

Juan Moreno-Garcia, Jesús Rosado

A Probabilistic Author-Centered Model for Twitter Discussions

In a recent work some of the authors have developed an argumentative approach for discovering relevant opinions in Twitter discussions with probabilistic valued relationships. Given a Twitter discussion, the system builds an argument graph where each node denotes a tweet and each edge denotes a criticism relationship between a pair of tweets of the discussion. Relationships between tweets are associated with a probability value, indicating the uncertainty on whether they actually hold. In this work we introduce and investigate a natural extension of the representation model, referred as probabilistic author-centered model. In this model, tweets by a same author are grouped, describing his/her opinion in the discussion, and are represented with a single node in the graph, while edges stand for criticism relationships between author’s opinions. In this new model, interactions between authors can give rise to circular criticism relationships, and the probability of one opinion criticizing another is evaluated from the criticism probabilities among the individual tweets in both opinions.

Teresa Alsinet, Josep Argelich, Ramón Béjar, Francesc Esteva, Lluis Godo

A Concept-Based Text Analysis Approach Using Knowledge Graph

The large amounts and growing of unstructured texts, available in Internet and other scenarios, are becoming a very valuable resource of information and knowledge. The present work describes a concept-based text analysis approach, based on the use of a knowledge graph for structuring the texts content and a query language for retrieving relevant information and obtaining knowledge from the knowledge graph automatically generated. In the querying process, a semantic analysis method is applied for searching and integrating the conceptual structures from the knowledge graph, which is supported by a disambiguation algorithm and WordNet. The applicability of the proposed approach was evaluated in the analysis of scientific articles from a Systematic Literature Review and the results were contrasted with the conclusions obtained by the authors of this review.

Wenny Hojas-Mazo, Alfredo Simón-Cuevas, Manuel de la Iglesia Campos, Francisco P. Romero, José A. Olivas

Tri-partitions and Uncertainty

Frontmatter

An Efficient Gradual Three-Way Decision Cluster Ensemble Approach

Cluster ensemble has emerged as a powerful technique for combining multiple clustering results. However, existing cluster ensemble approaches are usually restricted to two-way clustering, and they also cannot flexibility provide two-way or three-way clustering result accordingly. The main objective of this paper is to propose a general cluster ensemble framework for both two-way decision clustering and three-way decision. A cluster is represented by three regions such as the positive region, boundary region and negative region. The three-way representation intuitively shows which objects are fringe to the cluster. In this work, the number of ensemble members is increased gradually in each decision (iteration), it is different from the existing cluster ensemble methods in which all available ensemble members join the computing at one decision. It can be ended at a three-way decision final clusters or choose to go on until all the objects are assigned to the positive or negative region of the cluster determinately. The experimental results show that the proposed gradual three-way decision cluster ensemble approach is effective for reducing the running time and not sacrificing the accuracy.

Hong Yu, Guoyin Wang

Modes of Sequential Three-Way Classifications

We present a framework for studying sequential three-way classifications based on a sequence of description spaces and a sequence of evaluation functions. In each stage, a pair of a description space and an evaluation function is used for a three-way classification. A set of objects is classified into three regions. The positive region contains positive instances of a given class, the negative region contains negative instances, and the boundary region contains those objects that cannot be classified as positive or negative instances due to insufficient information. By using finer description spaces and finer evaluations, we may be able to make definite classifications for those objects in the boundary region in multiple steps, which gives a sequential three-way classification. We examine four particular modes of sequential three-way classifications with respect to multiple levels of granularity, probabilistic rough set theory, multiple models of classification, and ensemble classifications.

Yiyu Yao, Mengjun Hu, Xiaofei Deng

Determining Strategies in Game-Theoretic Shadowed Sets

A three-way approximation of shadowed sets maps the membership grades of all objects into a three-value set with a pair of thresholds. The game-theoretic shadowed sets (GTSS) determine and interpret a pair of thresholds of three-way approximations based on a principle of tradeoff with games. GTSS formulate competitive games between the elevation and reduction errors. The players start from the initial thresholds (1,0) and perform the certain strategies to change the thresholds in the game. The games are repeated with the updated thresholds to gradually reach the suitable thresholds. However, starting from a pair of randomly selected non-(1,0) thresholds is not examined in GTSS. We propose a game approach to make it possible for GTSS starting from a pair of randomly selected thresholds and then determine the strategies associated with them. In particular, given a pair of randomly chosen initial thresholds, we use a game mechanism to determine the change directions that players prefer to make on the initial thresholds. The proposed approach supplements the GTSS, and can be added in the game formulation and repetition learning phases. We explain the game formulation, equilibrium analysis, and the determination of strategies in this paper. An example demonstrates how the proposed approach can supplement GTSS to obtain the thresholds of three-way approximations of shadowed sets when starting from randomly selected thresholds.

Yan Zhang, JingTao Yao

Three-Way and Semi-supervised Decision Tree Learning Based on Orthopartitions

Decision Tree Learning is one of the most popular machine learning techniques. A common problem with this approach is the inability to properly manage uncertainty and inconsistency in the underlying datasets. In this work we propose two generalized Decision Tree Learning models based on the notion of Orthopair: the first method allows the induced classifiers to abstain on certain instances, while the second one works with unlabeled outputs, thus enabling semi-supervised learning.

Andrea Campagner, Davide Ciucci

Springer Professional

About this book

Table of Contents

Frontmatter

Fuzzy Methods in Data Mining and Knowledge Discovery

Frontmatter

Fuzzy Analysis of Sentiment Terms for Topic Detection Process in Social Networks

Fuzzy Association Rules Mining Using Spark

A Typology of Data Anomalies

IF-CLARANS: Intuitionistic Fuzzy Algorithm for Big Data Clustering

Semi-supervised Fuzzy c-Means Variants: A Study on Noisy Label Supervision

Towards a Hierarchical Extension of Contextual Bipolar Queries

Towards an App Based on FIWARE Architecture and Data Mining with Imperfect Data

A Fuzzy Close Algorithm for Mining Fuzzy Association Rules

Datil: Learning Fuzzy Ontology Datatypes

Fuzzy Transforms: Theory and Applications to Data Analysis and Image Processing

Frontmatter

Axiomatic of Inverse Lattice-Valued F-transform

Why Triangular Membership Functions are Often Efficient in F-transform Applications: Relation to Probabilistic and Interval Uncertainty and to Haar Wavelets

Enhanced F-transform Exemplar Based Image Inpainting

A Novel Approach to the Discrete Fuzzy Transform of Higher Degree

Lattice-Valued F-Transforms as Interior Operators of L-Fuzzy Pretopological Spaces

Modified F-transform Based on B-splines

Collocation Method for Linear BVPs via B-spline Based Fuzzy Transform

Imprecise Probabilities: Foundations and Applications

Frontmatter

Natural Extension of Choice Functions

Approximations of Coherent Lower Probabilities by 2-monotone Capacities

Web Apps and Imprecise Probabilitites

Conditional Submodular Coherent Risk Measures

Mathematical Fuzzy Logic and Mathematical Morphology

Frontmatter

On the Structure of Group-Like FL-chains

Logics for Strict Coherence and Carnap-Regular Probability Functions

Connecting Systems of Mathematical Fuzzy Logic with Fuzzy Concept Lattices

Spatio-Temporal Drought Identification Through Mathematical Morphology

Measures of Comparison and Entropies for Fuzzy Sets and Their Extensions

Frontmatter

On Dissimilarity Measures at the Fuzzy Partition Level

Monotonicity of a Profile of Rankings with Ties

Consistency Properties for Fuzzy Choice Functions: An Analysis with the Łukasiewicz t-norm

Entropy and Monotonicity

On the Problem of Comparing Ordered Ordinary Fuzzy Multisets

New Trends in Data Aggregation

Frontmatter

The Median Procedure as an Example of Penalty-Based Aggregation of Binary Relations

Least Median of Squares (LMS) and Least Trimmed Squares (LTS) Fitting for the Weighted Arithmetic Mean

Combining Absolute and Relative Information in Studies on Food Quality

Twofold Binary Image Consensus for Medical Imaging Meta-Analysis

Pre-aggregation Functions and Generalized Forms of Monotonicity

Frontmatter

Penalty-Based Functions Defined by Pre-aggregation Functions

Strengthened Ordered Directional and Other Generalizations of Monotonicity for Aggregation Functions

A Study of Different Families of Fusion Functions for Combining Classifiers in the One-vs-One Strategy

Rough and Fuzzy Similarity Modelling Tools

Frontmatter

Object [Re]Cognition with Similarity

Attribute Reduction of Set-Valued Decision Information System

Defuzzyfication in Interpretation of Comparator Networks

A Comparison of Characteristic Sets and Generalized Maximal Consistent Blocks in Mining Incomplete Data

Rules Induced from Rough Sets in Information Tables with Continuous Values

How to Match Jobs and Candidates - A Recruitment Support System Based on Feature Engineering and Advanced Analytics

Similarity-Based Accuracy Measures for Approximate Query Results

A Linear Model for Three-Way Analysis of Facial Similarity

Empirical Comparison of Distances for Agglomerative Hierarchical Clustering

Soft Computing for Decision Making in Uncertainty

Frontmatter

Missing Data Imputation by LOLIMOT and FSVM/FSVR Algorithms with a Novel Approach: A Comparative Study

Two Modifications of the Automatic Rule Base Synthesis for Fuzzy Control and Decision Making Systems

Decision Making Under Incompleteness Based on Soft Set Theory

Intelligent Decision Support System for Selecting the University-Industry Cooperation Model Using Modified Antecedent-Consequent Method

Strategy to Managing Mixed Datasets with Missing Items

Predicting Opponent Moves for Improving Hearthstone AI

A New Generic Framework for Argumentation-Based Negotiation Using Case-Based Reasoning

Soft Computing in Information Retrieval and Sentiment Analysis

Frontmatter

Obtaining WAPO-Structure Through Inverted Indexes

Automatic Expansion of Spatial Ontologies for Geographic Information Retrieval

Using Syntactic Analysis to Enhance Aspect Based Sentiment Analysis

A Probabilistic Author-Centered Model for Twitter Discussions