nach oben

2019 | Buch

Uncertainty Modelling in Data Science

herausgegeben von: Sébastien Destercke, Thierry Denoeux, María Ángeles Gil, Przemyslaw Grzegorzewski, Olgierd Hryniewicz

Verlag: Springer International Publishing

Buchreihe : Advances in Intelligent Systems and Computing

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book features 29 peer-reviewed papers presented at the 9th International Conference on Soft Methods in Probability and Statistics (SMPS 2018), which was held in conjunction with the 5th International Conference on Belief Functions (BELIEF 2018) in Compiègne, France on September 17–21, 2018. It includes foundational, methodological and applied contributions on topics as varied as imprecise data handling, linguistic summaries, model coherence, imprecise Markov chains, and robust optimisation. These proceedings were produced using EasyChair.

Over recent decades, interest in extensions and alternatives to probability and statistics has increased significantly in diverse areas, including decision-making, data mining and machine learning, and optimisation. This interest stems from the need to enrich existing models, in order to include different facets of uncertainty, like ignorance, vagueness, randomness, conflict or imprecision. Frameworks such as rough sets, fuzzy sets, fuzzy random variables, random sets, belief functions, possibility theory, imprecise probabilities, lower previsions, and desirable gambles all share this goal, but have emerged from different needs.

The advances, results and tools presented in this book are important in the ubiquitous and fast-growing fields of data science, machine learning and artificial intelligence. Indeed, an important aspect of some of the learned predictive models is the trust placed in them.

Modelling the uncertainty associated with the data and the models carefully and with principled methods is one of the means of increasing this trust, as the model will then be able to distinguish between reliable and less reliable predictions. In addition, extensions such as fuzzy sets can be explicitly designed to provide interpretable predictive models, facilitating user interaction and increasing trust.

Inhaltsverzeichnis

Frontmatter

Imprecise Statistical Inference for Accelerated Life Testing Data: Imprecision Related to Log-Rank Test

Abstract

In this paper we consider an imprecise predictive inference method for accelerated life testing. The method is largely nonparametric, with a basic parametric function to link different stress levels. We discuss in detail how we use the log-rank test to provide adequate imprecision for the link function parameter.

Abdullah A. H. Ahmadini, Frank P. A. Coolen

Descriptive Comparison of the Rating Scales Through Different Scale Estimates: Simulation-Based Analysis

Abstract

In dealing with intrinsically imprecise-valued magnitudes, a common rating scale type is the natural language-based Likert. Along the last decades, fuzzy scales (more concretely, fuzzy linguistic scales/variables and fuzzy ratig scales) have also been considered for rating values of these magnitudes. A comparative descriptive analysis focussed on the variability/dispersion associated with the magnitude depending on the considered rating scale is performed in this study. Fuzzy rating responses are simulated and associated with Likert responses by means of a ‘Likertization’ criterion. Then, each ‘Likertized’ datum is encoded by means of a fuzzy linguistic scale. In this way, with the responses available in the three scales, the value of the different dispersion estimators is calculated and compared among the scales.

Irene Arellano, Beatriz Sinova, Sara de la Rosa de Sáa, María Asunción Lubiano, María Ángeles Gil

Central Moments of a Fuzzy Random Variable Using the Signed Distance: A Look Towards the Variance

Abstract

The central moments of a random variable are extensively used to understand the characteristics of distributions in classical statistics. It is well known that the second central moment of a given random variable is simply its variance. When fuzziness in data occurs, the situation becomes much more complicated. The central moments of a fuzzy random variable are often very difficult to be calculated because of the analytical complexity associated with the product of two fuzzy numbers. An estimation is needed. Our research showed that the so-called signed distance is a great tool for this task. The main contribution of this paper is to present the central moments of a fuzzy random variable using this distance. Furthermore, since we are interested in the statistical measures of the distribution, particularly the variance, we put an attention on its estimation using the signed distance. Using this distance in approximating the square of a fuzzy difference, we can get an unbiased estimator of the variance. Finally, we prove that in some conditions our methodology related to the signed distance returns an exact crisp variance.

Rédina Berkachy, Laurent Donzé

On Missing Membership Degrees: Modelling Non-existence, Ignorance and Inconsistency

Abstract

In real-world applications, mathematical models must often deal with values that are missing or undefined. The aim of this paper is to provide a survey on types and reasons for such non-availability. It motivates the need to handle different reasons for missingness in a different, but appropriate way. In particular, non-existence, ignorance, and inconsistency are studied. The paper also presents a novel way of how to compute with different types of missing values at the same time.

Michal Burda, Petra Murinová, Viktor Pavliska

Characterization of Conditional Submodular Capacities: Coherence and Extension

Abstract

We provide a representation in terms of a linearly ordered class of (unconditional) submodular capacities of an axiomatically defined conditional submodular capacity. This allows to provide a notion of coherence for a partial assessment and a related notion of coherent extension.

Giulianella Coletti, Davide Petturiti, Barbara Vantaggi

Some Partial Order Relations on a Set of Random Variables

Abstract

Reciprocal relations, i.e. [0, 1]-valued relations Q satisfying \(Q(a,b)+Q(b,a)=1\), provide a convenient tool for expressing the result of the pairwise comparison of a set of alternatives.

Bernard De Baets, Hans De Meyer

A Desirability-Based Axiomatisation for Coherent Choice Functions

Abstract

Choice functions constitute a simple, direct and very general mathematical framework for modelling choice under uncertainty. In particular, they are able to represent the set-valued choices that typically arise from applying decision rules to imprecise-probabilistic uncertainty models. We provide them with a clear interpretation in terms of attitudes towards gambling, borrowing ideas from the theory of sets of desirable gambles, and we use this interpretation to derive a set of basic axioms. We show that these axioms lead to a full-fledged theory of coherent choice functions, which includes a representation in terms of sets of desirable gambles, and a conservative inference method.

Jasper De Bock, Gert de Cooman

Cycle-Free Cuts of the Reciprocal Relation Generated by Random Variables that are Pairwisely Coupled by a Frank Copula

Abstract

Some years ago, we have investigated the transitivity properties of the reciprocal relation generated from the pairwise comparison of the components of a random vector [2].

Hans De Meyer, Bernard De Baets

Density Estimation with Imprecise Kernels: Application to Classification

Abstract

In this paper, we explore the problem of estimating lower and upper densities from imprecisely defined families of parametric kernels. Such estimations allow to rely on a single bandwidth value, and we show that it provides good results on classification tasks when extending the naive Bayesian classifier.

Guillaume Dendievel, Sebastien Destercke, Pierre Wachalski

Z-numbers as Generalized Probability Boxes

Abstract

This paper proposes a new approach to the notion of Z-number, i.e., a pair (A, B) of fuzzy sets modeling a probability-qualified fuzzy statement, proposed by Zadeh. Originally, a Z-number is viewed as the fuzzy set of probability functions stemming from the flexible restriction of the probability of the fuzzy event A by the fuzzy interval B on the probability scale. However, a probability-qualified statement represented by a Z-number fails to come down to the original fuzzy statement when the attached probability is 1. This representation also leads to complex calculations. It is shown that simpler representations can be proposed, that avoid these pitfalls, starting from the remark that when both fuzzy sets A and B forming the Z-number are crisp, the generated set of probabilities is representable by a special kind of belief function that corresponds to a probability box (p-box). Then two proposals are made to generalize this approach when the two sets are fuzzy. One idea is to consider a Z-number as a weighted family of crisp Z-numbers, obtained by independent cuts of the two fuzzy sets, that can be averaged. In the other approach, a Z-number comes down to a pair of possibility distributions on the universe of A forming a generalized p-box. With our proposal, computation with Z-numbers come down to uncertainty propagation with random intervals.

Didier Dubois, Henri Prade

Computing Inferences for Large-Scale Continuous-Time Markov Chains by Combining Lumping with Imprecision

Abstract

If the state space of a homogeneous continuous-time Markov chain is too large, making inferences—here limited to determining marginal or limit expectations—becomes computationally infeasible. Fortunately, the state space of such a chain is usually too detailed for the inferences we are interested in, in the sense that a less detailed—smaller—state space suffices to unambiguously formalise the inference. However, in general this so-called lumped state space inhibits computing exact inferences because the corresponding dynamics are unknown and/or intractable to obtain. We address this issue by considering an imprecise continuous-time Markov chain. In this way, we are able to provide guaranteed lower and upper bounds for the inferences of interest, without suffering from the curse of dimensionality.

Alexander Erreygers, Jasper De Bock

Robust Fuzzy Relational Clustering of Non-linear Data

Abstract

In many practical situations data may be characterized by non-linear structures. Classical (hard or fuzzy) algorithms, usually based on the Euclidean distance, implicitly lead to spherical shape clusters and, therefore, do not identify clusters properly. In this paper we deal with non-linear structures in clustering by means of the geodesic distance, able to capture and preserve the intrinsic geometry of the data. We introduce a new fuzzy relational clustering algorithm based on the geodesic distance. Furthermore, to improve its adequacy, a robust version is proposed in order to take into account the presence of outliers.

Maria Brigida Ferraro, Paolo Giordani

Measures of Dispersion for Interval Data

Abstract

Almost all experiments reveal variability of their results. In this contribution we consider the measures of dispersion for sample of random intervals. In particular, we suggest a generalization of two well-known classical measures of dispersion, i.e. the range and the interquartile range, for interval-valued samples.

Przemyslaw Grzegorzewski

A Maximum Likelihood Approach to Inference Under Coarse Data Based on Minimax Regret

Abstract

Various methods have been proposed to express and solve maximum likelihood problems with incomplete data. In some of these approaches, the idea is that incompleteness makes the likelihood function imprecise. Two proposals can be found to cope with this situation: maximize the maximal likelihood induced by precise datasets compatible with the incomplete observations, or maximize the minimal such likelihood. These approaches prove to be extremist, the maximax approach having a tendency to disambiguate the data, while the maximin approach favors uniform distributions. In this paper we propose an alternative approach consisting in minimax relative regret criterion with respect to maximal likelihood solutions obtained for all precise datasets compatible with the coarse data. It uses relative likelihood and seems to achieve a trade-off between the maximax and the maximin methods.

Romain Guillaume, Didier Dubois

Monitoring of Time Series Using Fuzzy Weighted Prediction Models

Abstract

Monitoring of processes described by autocorrelated time series data has been considered. For this purpose, we propose to use the Shewhart control chart for residuals, designed using the \(WAM*\) approach introduced by the authors. The main problem, that has to be solved in the design of the proposed control chart, is the choice of the weight \(w_0\) assigned to the model estimated directly from data. In this paper, we propose a method for choosing the optimal value of \(w_0\) using fuzzy values of parameters describing a monitored process.

Olgierd Hryniewicz, Katarzyna Kaczmarek-Majer

Control Charts Designed Using Model Averaging Approach for Phase Change Detection in Bipolar Disorder

Abstract

Bipolar disorder is a mental illness affecting over 1% of the world’s population. In the course of disease there are episodic fluctuations between different mood phases, ranging from depression to manic episodes and mixed states. Early detection and treatment of prodromal symptoms of affective episode recurrence is crucial since it reduces the conversion rates to full-blown illness and decreases the symptoms severity. This can be achieved by monitoring the mood stability with the use of data collected from patients’ smartphones. We provide an illustrative example of the application of control charts to early and reliably generate notifications about the change of the bipolarity phase. Our charts are designed with the weighted model averaging approaches WAM* and WAMs for the detection of disturbances in the stability of the monitored processes. The models are selected in a novel way using the autocorrelation functions. The proposed approach delivers results that have clear, psychiatric interpretation. Control charts based on weighted model averaging are a promising tool for monitoring patients suffering from bipolar disorder, especially in case of limited amount of diagnostic data.

Katarzyna Kaczmarek-Majer, Olgierd Hryniewicz, Karol R. Opara, Weronika Radziszewska, Anna Olwert, Jan W. Owsiński, Sławomir Zadrożny

An Imprecise Probabilistic Estimator for the Transition Rate Matrix of a Continuous-Time Markov Chain

Abstract

We consider the problem of estimating the transition rate matrix of a continuous-time Markov chain from a finite-duration realisation of this process. We approach this problem in an imprecise probabilistic framework, using a set of prior distributions on the unknown transition rate matrix. The resulting estimator is a set of transition rate matrices that, for reasons of conjugacy, is easy to find. To determine the hyperparameters for our set of priors, we reconsider the problem in discrete time, where we can use the well-known Imprecise Dirichlet Model. In particular, we show how the limit of the resulting discrete-time estimators is a continuous-time estimator. It corresponds to a specific choice of hyperparameters and has an exceptionally simple closed-form expression.

Thomas Krak, Alexander Erreygers, Jasper De Bock

Imprecise Probability Inference on Masked Multicomponent System

Abstract

Outside of controlled experiment scope, we have only limited information available to carry out desired inferences. One such scenario is when we wish to infer the topology of a system given only data representing system lifetimes without information about states of components in time of system failure, and only limited information about lifetimes of the components of which the system is composed. This scenario, masked system inference, has been studied before for systems with only one component type, with interest of inferring both system topology and lifetime distribution of component composing it. In this paper we study similar scenario in which we consider systems consisting of multiple types of components. We assume that distribution of component lifetimes is known to belong to a prior-specified set of distributions and our intention is to reflect this information via a set of likelihood functions which will be used to obtain an imprecise posterior on the set of considered system topologies.

Daniel Krpelik, Frank P. A. Coolen, Louis J. M. Aslett

Regression Ensemble with Linguistic Descriptions

Abstract

In this contribution, we present a brief presentation of a method which allows automatically to create an ensemble of regression techniques and compare this method to standard approaches. This is done with the help of mined linguistic rule base which is further used by advanced Perception-based Logical Deduction. As a possible side effect, we can obtain a linguistic description of the evaluative process.

Jiří Kupka, Pavel Rusnok

Dynamic Classifier Selection Based on Imprecise Probabilities: A Case Study for the Naive Bayes Classifier

Abstract

Dynamic classifier selection is a classification technique that, for every new instance to be classified, selects and uses the most competent classifier among a set of available ones. In this way, a new classifier is obtained, whose accuracy often outperforms that of the individual classifiers it is based on. We here present a version of this technique where, for a given instance, the competency of a classifier is based on the robustness of its prediction: the extent to which the classifier can be altered without changing its prediction. In order to define and compute this robustness, we adopt methods from the theory of imprecise probabilities. As a proof of concept, we here apply this idea to the simple case of naive Bayes classifiers. Based on our preliminary experiments, we find that the resulting classifier outperforms the individual classifiers it is based on.

Meizhu Li, Jasper De Bock, Gert de Cooman

Case Study-Based Sensitivity Analysis of Scale Estimates w.r.t. the Shape of Fuzzy Data

Abstract

For practical purposes, and to ease both the drawing and the computing processes, the fuzzy rating scale was originally introduced assuming values based on such a scale to be modeled by means of trapezoidal fuzzy numbers. In this paper, to know whether or not such an assumption is too restrictive, we are going to examine on the basis of a real-life example how statistical conclusions concerning location-based scale estimates are affected by the shape chosen to model imprecise data with fuzzy numbers. The discussion will be descriptive for the considered scale estimates, but for the Fréchet-type variance it will be also inferential. The study will lead us to conclude that statistical conclusions are scarcely influenced by data shape.

María Asunción Lubiano, Carlos Carleos, Manuel Montenegro, María Ángeles Gil

Compatibility, Coherence and the RIP

Abstract

We generalise the classical result on the compatibility of marginal, possible non-disjoint, assessments in terms of the running intersection property to the imprecise case, where our beliefs are modelled in terms of sets of desirable gambles. We consider the case where we have unconditional and conditional assessments, and show that the problem can be simplified via a tree decomposition.

Enrique Miranda, Marco Zaffalon

Estimation of Classification Probabilities in Small Domains Accounting for Nonresponse Relying on Imprecise Probability

Abstract

Nonresponse treatment is usually carried out through imposing strong assumptions regarding the response process in order to achieve point identifiability of the parameters of interest. Problematically, such assumptions are usually not readily testable and fallaciously imposing them may lead to severely biased estimates. In this paper we develop generalized Bayesian imprecise probability methods for estimation of proportions under potentially nonignorable nonresponse using data from small domains. Namely, we generalize the imprecise Beta model to this setting, treating missing values in a cautious way. Additionally, we extend the empirical Bayes model introduced by Stasny (1991, JASA) by considering a set of priors arising, for instance, from neighborhoods of maximum likelihood estimates of the hyper parameters. We reanalyze data from the American National Crime Survey to estimate the probability of victimization in domains formed by cross-classification of certain characteristics.

Aziz Omar, Thomas Augustin

Beyond Doss and Fréchet Expectation Sets

Abstract

In this work, we study a family of sets which generalizes the notions of Fréchet and Doss expectation for a random variable and a random set. They appear as a natural generalization of the Fréchet functional. We study their main properties, paying special attention to the cases of finite and finite-dimensional metric spaces.

Juan Jesus Salamanca

Empirical Comparison of the Performance of Location Estimates of Fuzzy Number-Valued Data

Abstract

Several location measures have already been proposed in the literature in order to summarize the central tendency of a random fuzzy number in a robust way. Among them, fuzzy trimmed means and fuzzy M-estimators of location extend two successful approaches from the real-valued settings. The aim of this work is to present an empirical comparison of different location estimators, including both fuzzy trimmed means and fuzzy M-estimators, to study their differences in finite sample behaviour.

Beatriz Sinova, Stefan Van Aelst

Continuity of the Shafer-Vovk-Ville Operator

Abstract

Kolmogorov’s axiomatic framework is the best-known approach to describing probabilities and, due to its use of the Lebesgue integral, leads to remarkably strong continuity properties. However, it relies on the specification of a probability measure on all measurable events. The game-theoretic framework proposed by Shafer and Vovk does without this restriction. They define global upper expectation operators using local betting options. We study the continuity properties of these more general operators. We prove that they are continuous with respect to upward convergence and show that this is not the case for downward convergence. We also prove a version of Fatou’s Lemma in this more general context. Finally, we prove their continuity with respect to point-wise limits of two-sided cuts.

Natan T’Joens, Gert de Cooman, Jasper De Bock

Choquet Theorem for Random Sets in Polish Spaces and Beyond

Abstract

A fundamental long-standing problem in the theory of random sets is concerned with the possible characterization of the distributions of random closed sets in Polish spaces via capacities. Such a characterization is known in the locally compact case (the Choquet theorem) in two equivalent forms: using the compact sets and the open sets as test sets. The general case has remained elusive. We solve the problem in the affirmative using open test sets.

Pedro Terán

Generalising the Pari-Mutuel Model

Abstract

We introduce two models for imprecise probabilities which generalise the Pari-Mutuel Model while retaining its simple structure. Their consistency properties are investigated, as well as their capability of formalising an assessor’s different attitudes. It turns out that one model is always coherent, while the other is (occasionally coherent but) generally only 2-coherent, and may elicit a conflicting attitude towards risk.

Chiara Corsato, Renato Pelessoni, Paolo Vicig

A Net Premium Model for Life Insurance Under a Sort of Generalized Uncertain Interest Rates

Abstract

In this paper, we apply the LR-fuzzy random variables to estimate a discount function that associated with a generalized interest rate as well as fuzzy random variable future lifetimes, and establish some life annuity and endowments models. A novel fuzzy net premium model is obtained. Finally, a statistical simulation is given for illustrating the proposed models.

Dabuxilatu Wang

Backmatter

Titel: Uncertainty Modelling in Data Science
herausgegeben von: Sébastien Destercke
Thierry Denoeux
María Ángeles Gil
Przemyslaw Grzegorzewski
Olgierd Hryniewicz
Verlag: Springer International Publishing
Electronic ISBN: 978-3-319-97547-4
Print ISBN: 978-3-319-97546-7
DOI: https://doi.org/10.1007/978-3-319-97547-4