scroll identifier for mobile
main-content

## Über dieses Buch

In many practical situations, we are interested in statistics characterizing a population of objects: e.g. in the mean height of people from a certain area.

Most algorithms for estimating such statistics assume that the sample values are exact. In practice, sample values come from measurements, and measurements are never absolutely accurate. Sometimes, we know the exact probability distribution of the measurement inaccuracy, but often, we only know the upper bound on this inaccuracy. In this case, we have interval uncertainty: e.g. if the measured value is 1.0, and inaccuracy is bounded by 0.1, then the actual (unknown) value of the quantity can be anywhere between 1.0 - 0.1 = 0.9 and 1.0 + 0.1 = 1.1. In other cases, the values are expert estimates, and we only have fuzzy information about the estimation inaccuracy.

This book shows how to compute statistics under such interval and fuzzy uncertainty. The resulting methods are applied to computer science (optimal scheduling of different processors), to information technology (maintaining privacy), to computer engineering (design of computer chips), and to data processing in geosciences, radar imaging, and structural mechanics.

## Inhaltsverzeichnis

### Formulation of the Problem

Need

for

statistical

analysis

. In many areas of science and engineering, we have a class (“population”) of objects, and we are interested in the values of one or several quantities characterizing objects from this population. For example, we are interested in the human population in a certain region, and we are interested in their heights, weights, etc.

Different objects from a population have, in general, different values of the desired characteristics. Measuring, storing, and processing

all

these values is often not practically feasible. Instead of measuring all these values, we therefore measure the values

x

1

,...,

x

n

corresponding to a

sample

from the population. Based on these sample values, we need to estimate

statistical

characteristics

c

– i.e., characteristics that describe the population as a whole, such as the mean and the variance of different quantities, the correlation between different quantities, etc.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Statistics under Probabilistic and Interval Uncertainty: A Brief Description

Computing

statistics

under

probabilistic

uncertainty

. In the case of

probabilistic

uncertainty

, we know the probability distributions for measurement errors corresponding to all the inputs

x

1

,...,

x

n

, and we want to find the probability distribution corresponding to the statistic

C

(

x

1

,...,

x

n

).

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Statistics under Fuzzy Uncertainty: Formulation of the Problem

Need

to

process

fuzzy

uncertainty

. In many practical situations, we only have expert estimates for the inputs

x

i

. Sometimes, experts provide guaranteed bounds on the

x

i

, and even the probabilities of different values within these bounds. However, such cases are rare. Usually, the experts’ opinions about the uncertainty of their estimates are described by (imprecise, “fuzzy”) words from natural language. For example, an expert can say that the value

x

i

of the

i

-th quantity is approximately equal to 1.0, with an accuracy most probably of about 0.1.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing under Fuzzy Uncertainty Can Be Reduced to Computing under Interval Uncertainty

An

alternative

set

representation

of

a

membership

function

:

alpha

−

cuts

. To describe the desired relation between computing under fuzzy uncertainty and computing under interval uncertainty, we must first reformulate fuzzy techniques in an interval-related form.

In some situations, an expert knows exactly which values of

x

i

are possible and which are not. In this situation, the expert’s knowledge can be naturally represented by describing the set of all possible values.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing under Interval Uncertainty: Traditional Approach Based on Uniform Distributions

statistical

approach

:

main

idea

. In the case of interval uncertainty, we only know the intervals, we do not know the probability distributions on these intervals. The traditional statistical approach to situations in which we have several alternatives with unknown probabilities is to use Laplace Principle of Indifference, according to which,

if we have several possible alternatives,

and we have no information about the probability of different alternatives,

we assume all these probabilities to be equal.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing under Interval Uncertainty: When Measurement Errors Are Small

Linearization

:

main

idea

. When the measurement errors Δ

x

i

are relatively small, we can use a simplification called

linearization

. The main idea of linearization is as follows.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing under Interval Uncertainty: General Algorithms

Need

for

interval

computations

. In many application areas, it is sufficient to have an approximate estimate of

y

– e.g., an estimate obtained from linearization. However, in some applications, it is important to guarantee that the (unknown) actual value

y

of a certain quantity does not exceed a certain threshold

y

0. The only way to guarantee this is to have an interval

Y

= [

Y

,

$\overline{Y}$

] which is guaranteed to contain

y

(i.e., for which

y

Y

) and for which

$\overline{Y}$

y

0.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing under Interval Uncertainty: Computational Complexity

In this chapter, we will briefly describe the computational complexity of the range estimation problem under interval uncertainty.

Linear

case

. Let us start with the simplest case of a linear function

y

=

f

(

x

1

,...,

x

n

) =

a

0

+

$\sum\limits^{n}_{i=1}$

a

i

·

x

i

.

In this case, substituting the (approximate) measured values

$\tilde{x}_{i}$

, we get the approximate value

$\tilde{y}$

=

a

0

+

$\sum\limits^{n}_{i=1}$

a

i

·

$\tilde{x}_{i}$

for

y

.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Towards Selecting Appropriate Statistical Characteristics: The Basics of Decision Theory and the Notion of Utility

In the previous chapter, we mentioned that in general, the problem of estimating statistical characteristics under interval uncertainty is NP-hard. This means, crudely speaking, that it is not possible to design a feasible algorithm that would compute all statistics under interval uncertainty. It is therefore necessary to restrict ourselves to statistical characteristics which are practically useful.

Which statistical characteristics should we estimate? One of the main objectives of data processing is to make decisions. Thus, to find the most appropriate statistical characteristics, let us recall the traditional way of making decisions based on user’s preference: the decision theory.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### How to Select Appropriate Statistical Characteristics

Which

is

the

best

way

to

describe

the

corresponding

probabilistic

uncertainty

? One of the main objectives of data processing is to make decisions. As we have seen in the previous chapter, a standard way of making a decision is to select the action

a

for which the expected utility (gain) is the largest possible. This is where probabilities are used: in computing, for every possible action

a

, the corresponding expected utility. To be more precise, we usually know, for each action

a

and for each actual value of the (unknown) quantity

x

, the corresponding value of the utility

u

a

(

x

). We must use the probability distribution for

x

to compute the expected value e[

u

a

(

x

)] of this utility.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing under Fuzzy Uncertainty Can Be Reduced to Computing under Interval Uncertainty: Reminder

In this part, we present algorithms for computing the values of different statistical characteristics

C

(

x

1

,...,

x

n

) under interval and fuzzy uncertainty.

In Chapter 4, we have explained that the problem of computing these values under fuzzy uncertainty can be reduced to the problem of computing the values of this characteristic under interval uncertainty. Namely, for every

α

∈ [0, 1], the alpha-cut

y

(

α

) of the desired fuzzy value is the interval that corresponds to estimating the same characteristic

C

(

x

1

,...,

x

n

) under interval uncertainty – specifically, under the assumption that the

i

-th input belongs to the

α

-cut of the corresponding fuzzy number

x

i

x

i

(

α

):

y

(

α

) = {

C

(

x

1

,...,

x

n

):

x

1

x

1

(

α

),...,

x

n

x

n

(

α

)}.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Mean under Interval Uncertainty

We have already mentioned that for the interval data

x

1

= [

$\underline{x}_{1}$

,

$\overline{x}_{1}$

],...,

x

n

= [

$\underline{x}_{n}$

,

$\overline{x}_{n}$

], a reasonable estimate for the corresponding statistical characteristic

C

(

x

1

,...,

x

n

) is the range

C

(

x

1

,...,

x

n

)

$\scriptstyle \rm def\atop{=}$

{

C

(

x

1

,...,

x

n

) |

x

1

x

1

,...,

x

n

x

1

}.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Median (and Quantiles) under Interval Uncertainty

Need

to

go

beyond

arithmetic

average

. We have mentioned, in the Formulation of the Problem chapter, that an important source of interval uncertainty is the existence of the lower detection limits for sensors: if a sensor does not detect any signal this means that the actual value of the measured quantity is below its detection limit

DL

, i.e., in the interval [0,

DL

].

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Variance under Interval Uncertainty: An Example of an NP-Hard Problem

Formulation

of

the

problem

:

reminder

. In many practical applications, we need to estimate the sample variance

V

=

$\frac{1}{n}$

·

$\sum\limits^{n}_{i=1}{(x_{i} - E)}^{2}$

, where

E

=

$\frac{1}{n}$

·

$\sum\limits^{n}_{i=1}$

x

i

.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Types of Interval Data Sets: Towards Feasible Algorithms

Need

to

consider

specific

types

of

interval

data

sets

. The main objective of this book is to compute statistics under interval uncertainty. The simplest and most widely used statistical characteristics are mean and variance. We already know that computing the mean under interval uncertainty is straightforward. However, as the previous chapter shows, computing variance

V

under interval uncertainty is, in general, an NP-hard (computationally difficult) problem. As we will see in the following chapters, a similar problem is NP-hard for many other statistical characteristics

C

as well.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Variance under Interval Uncertainty: Efficient Algorithms

We have shown that the problem of computing the upper endpoint

$\overline{V}$

is, in general, NP-hard (later on, in this chapter, we will see that the lower endpoint

$\underline{V}$

can be always computed in feasible (polynomial) time). Since we cannot always efficiently compute the upper endpoint

$\overline{V}$

, we therefore need to consider cases when such an efficient computation may be possible.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Variance under Hierarchical Privacy-Related Interval Uncertainty

Need

for

hierarchical

statistical

analysis

. In the above text, we assumed that we have all the data in one large database, and we process this large statistical database to estimate the desired statistical characteristics.

To prevent privacy violations, we replace the original values of the quasiidentifier variables with ranges. For example, we divide the set of all possible ages into ranges [0, 10], [10, 20], [20, 30], etc. Then, instead of storing the actual age of 26, we only store the range [20, 30] which contains the actual age value.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Outlier Thresholds under Interval Uncertainty

In many application areas, it is important to detect outliers. The traditional engineering approach to outlier detection is that we start with some “normal” values

x

1

,...,

x

n

, compute the sample average

E

, the sample standard variation

σ

, and then mark a value

x

as an outlier if

x

is outside the

k

0

-sigma interval [

E

k

0

·

σ

,

E

+

k

0

·

σ

] (for some pre-selected parameter

k

0

). In real life, we often have only interval ranges [

$\underline{x}_{i}$

,

$\overline{x}_{i}$

] for the normal values

x

1

,...,

x

n

. In this case, we only have intervals of possible values for the “outlier threshold” – bounds

E

k

0

·

σ

and

E

+

k

0

·

σ

. We can therefore identify outliers as values that are outside all

k

0

-sigma intervals.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Higher Moments under Interval Uncertainty

Higher central moments

M

h

=

$\frac{1}{n}$

·

$\sum\limits^{n}_{i=1}$

(

x

i

−

E

)

h

are very useful in statistical analysis: the third moment

M

3

characterizes asymmetry of the corresponding probability distribution, the fourth moment

M

4

describes the size of the distribution’s tails, etc. To be more precise,

skewness

M

3

/

σ

3

is used to characterize asymmetry, and

kurtosis

M

4

/

σ

4

– 3 is used to characterize the size of the tails (3 is subtracted so that kurtosis is 0 for the practically frequent case of a normal distribution).

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Mean, Variance, Higher Moments, and Their Linear Combinations under Interval Uncertainty: A Brief Summary

In the previous chapters, we described several results and algorithms for computing:

the mean

E

,

the variance

V

=

σ

2

=

$\frac{1}{n}$

·

$\sum\limits^{n}_{i=1}{(x_{i} - E)}^{2}$

,

more generally, higher central moments

M

h

=

$\frac{1}{n}$

·

$\sum\limits^{n}_{i=1}{(x_{i} - E)}^{h}$

and

statistically useful linear combinations of these characteristics – such as the lower and upper endpoints of the confidence interval

L

=

E

k

0

·

σ

and

U

=

E

+

k

0

·

σ

, where the parameter

k

0

is usually taken as

k

0

= 2,

k

0

= 3, and

k

0

= 6.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Covariance under Interval Uncertainty

When we have two sets of data

x

1

,...,

x

n

and

y

1

,...,

y

n

, we normally compute

finite

population

covariance

C

xy

=

$\frac{1}{n}$

$\sum\limits^{n}_{i=1}{(x_{i} - E_{x})}$

· (

y

i

−

E

y

),

where

E

x

=

$\frac{1}{n}$

$\sum\limits^{n}_{i=1}$

x

i

;

E

y

=

$\frac{1}{n}$

$\sum\limits^{n}_{i=1}$

y

i

.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Correlation under Interval Uncertainty

As we have mentioned in Chapter 21, finite population covariance

C

between the data sets

x

1

,...,

x

n

and

y

1

,...,

y

n

is often used to compute finite population correlation

ρ

=

$\frac{C}{\sigma_{x} \cdot \sigma_{y}}$

,

where

σ

x

=

$\sqrt{V_{x}}$

is the finite population standard deviation of the values

x

1

,...,

x

n

, and

σ

y

=

$\sqrt{V_{y}}$

is the finite population standard deviation of the values

y

1

,...,

y

n

.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Expected Value under Interval Uncertainty

In statistics, the values of the statistical characteristics are estimated either

directly

from the sample

x

1

,...,

x

n

, or

indirectly

: based on the values of other statistical characteristics that have already been estimated based on the sample.

In most of the previous chapters, we considered the situations in which:

we want to compute the statistical characteristics of a probability distribution based on a sample

x

1

,...,

x

n

, and

we know the values

x

i

with interval uncertainty.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Entropy under Interval Uncertainty. I

Measurement results (and, more generally, estimates) are never absolutely accurate: there is always an uncertainty, the actual value

x

is, in general, different from the estimate

$\tilde{x}$

. Sometimes, we know the probability of different values of the estimation error Δx

$\scriptstyle \rm def\atop{=}$

$\tilde{x}$

x

, sometimes, we only know the interval of possible values of Δx, sometimes, we have interval bounds on the cdf of Δx. To compare different measuring instruments, it is desirable to know which of them brings more information – i.e., it is desirable to gauge the amount of information. For probabilistic uncertainty, this amount of information is described by Shannon’s entropy; similar measures can be developed for interval and other types of uncertainty. In this chapter, we start analyzing the problem of estimating information amount under different types of uncertainty.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Entropy under Interval Uncertainty. II

Formulation

of

the

problem

. In most practical situations, our knowledge is incomplete: there are several (

n

) different states which are consistent with our knowledge. How can we gauge this uncertainty? A natural measure of uncertainty is the average number of binary (“yes”-“no”) questions that we need to ask to find the exact state. This idea is behind Shannon’s

information

theory

: according to this theory, when we know the probabilities

p

1

,...,

p

n

of different states (for which ∑

p

i

= 1), then this average number of questions is equal to

S

= –

$\sum\limits^{n}_{i=1}$

p

i

·

log

2

(

p

i

). In information theory, this average number of question is called the amount of information.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing the Range of Convex Symmetric Functions under Interval Uncertainty

In general, a statistical characteristic

f

can be more complex so that even computing

f

can take much longer than linear time. For such

f

, the question is how to compute the range [

y

,

$\overline{y}$

] in as few calls to

f

as possible. We show that for convex symmetric functions

f

, we can compute

$\overline{y}$

in

n

calls to

f

.

The results of this chapter first appeared in [353].

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Statistics under Interval Uncertainty: Possibility of Parallelization

In this chapter, we show how the algorithms for estimating variance under interval and fuzzy uncertainty can be parallelized. The results of this chapter first appeared in [336].

Need

for

parallelization

. Traditional algorithms for computing the population variance

V

based on the exact values

x

1

,...,

x

n

take linear time

O

(

n

). Algorithms for estimating variance under interval uncertainty take a larger amount of computation time – e.g., time

O

(

n

·

log

(

n

)). How can we speed up these computations?

If we have several processors, then it is desirable to perform these algorithms in parallel on several processors, and thus, speed up computations. In this chapter, we show how the algorithms for estimating variance under interval and fuzzy uncertainty can be parallelized.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Computing Statistics under Interval Uncertainty: Case of Relative Accuracy

Formulation

of

the

problem

. In the previous chapters, we have shown that for many statistical characteristics

C

, computing them with a given

absolute

accuracy

ε

– i.e., computing a value

$\tilde{C}$

for which |

$\tilde{C}$

– C| ≤

ε

is NP-hard.

It turns out that if we are interested in computing these characteristics with

relative

accuracy – relative with respect to, e.g., the largest of the inputs – then it often possible to estimate these characteristics in polynomial time. These results first appeared in [57, 176].

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### How Reliable Is the Input Data?

In traditional interval computations, we assume that the interval data corresponds to guaranteed interval bounds, and that fuzzy estimates provided by experts are correct. In practice, measuring instruments are not 100% reliable, and experts are not 100% reliable, we may have estimates which are “way off”, intervals which do not contain the actual values at all. Usually, we know the percentage of such outlier un-reliable measurements. However, it is desirable to check that the reliability of the actual data is indeed within the given percentage. The problem of checking (gauging) this reliability is, in general, NP-hard; in reasonable cases, there exist feasible algorithms for solving this problem.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### How Accurate Is the Input Data?

Different models can be used to describe real-life phenomena: deterministic, probabilistic, fuzzy, models in which we have interval-valued or fuzzy-valued probabilities, etc. Models are usually not absolutely accurate. It is therefore important to know how accurate is a given model. In other words, it is important to be able to measure a mismatch between the model and the empirical data. In this chapter, we describe an approach of measuring this mismatch which is based on the notion of utility, the central notion of utility theory.

The main results of this chapter first appeared in [206]. In one of the following application chapters (Chapter 35), we show that a similar approach can be used to measure the loss of privacy.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### From Computing Statistics under Interval and Fuzzy Uncertainty to Practical Applications: Need to Propagate the Statistics through Data Processing

Need

for

data

processing

. In many areas of science and engineering, we are interested in a quantity

y

which is difficult (or even impossible) to measure directly. For example, it is difficult to directly measure the distance to a faraway star or the amount of oil in an oil well. To estimate this quantity, we can:

measure auxiliary easier-to-measure (or to estimate) quantities

x

1

,...,

x

n

which are related to

y

by a known dependence

y

=

f

(

x

1

,...,

x

n

), and then

use the results

$\tilde{x}_1,..., \tilde{x}_n$

of measuring (or estimating)

x

i

to compute the estimate

$\tilde{y} = f(\tilde{x}_1,..., \tilde{x}_n$

) for

y

.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Applications to Bioinformatics

Formulation

of

the

practical

problem

. In cancer research, it is important to find out the genetic difference between the cancer cells and the healthy cells.

In the ideal world, we should be able to have a sample of cancer cells, and a sample of healthy cells, and thus directly measure the concentrations

c

and

h

of a given gene in cancer and in healthy cells. In reality, it is very difficult to separate the cells, so we have to deal with samples that contain both cancer and normal cells.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Applications to Computer Science: Optimal Scheduling for Global Computing

In many practical situations, in particular in many bioinformatics problems, the amount of computations is so huge that the only way to perform these computations in reasonable time is to distribute them between multiple processors. The more processors we engage, the faster the resulting computations; thus, in addition to processor exclusively dedicated to this job, systems often use idle time on other processors. The use of these otherwise engaged processors adds additional uncertainty to computations.

How should we schedule the computational tasks so as to achieve the best utilization of the computational resources? Because of the presence of uncertainty, this scheduling problem is very difficult not only to solve but even to formalize (i.e., to describe in precise terms). In this chapter, we provide the first steps towards formalizing and solving this scheduling problem.

The main result of this chapter first appeared in [12].

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Applications to Information Management: How to Estimate Degree of Trust

In this chapter, we use the probabilistic and interval uncertainty to estimate the degree of trust in an agent. Some of these results first appeared in [56, 141].

Results and Algorithms

In the traditional approach to trust, we either trust an agent or not. If we trust an agent, we allow this agent full access to a particular task. For example, I trust my bank to handle my account; the bank (my agent) outsources money operations to another company (sub-agent), etc. The problem with this approach is that I may have only 99.9% trust in bank, bank in its contractor, etc. As a result, for long chains, the probability of a security leak may increase beyond any given threshold. To resolve this problem, we must keep track of trust probabilities.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Applications to Information Management: How to Measure Loss of Privacy

In this chapter, we use the experience of measuring a degree of mismatch between probability models, p-boxes, etc., described in Chapter 30, to measure loss of privacy. Some of our privacy-related results first appeared in [58].

Formulation and Analysis of the Problem, and the Corresponding Results

Measuring

loss

of

privacy

is

important

. Privacy means, in particular, that we do not disclose all information about ourselves. If some of the originally undisclosed information is disclosed, some privacy is lost. To compare different privacy protection schemes, we must be able to gauge the resulting loss of privacy.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Application to Signal Processing: Using 1-D Radar Observations to Detect a Space Explosion Core among the Explosion Fragments

A radar observes the result of a space explosion. Due to radar’s low horizontal resolution, we get a 1-D signal

x

(

t

) representing different 2-D slices. Based on these slices, we must distinguish between the body at the core of the explosion and the slowly out-moving fragments. We propose new algorithms for processing this 1-D data. Since these algorithms are time-consuming, we also exploit the possibility of parallelizing these algorithms.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Applications to Computer Engineering: Timing Analysis of Computer Chips

In chip design, one of the main objectives is to decrease its clock cycle. On the design stage, this time is usually estimated by using worst-case (interval) techniques, in which we only use the bounds on the parameters that lead to delays. This analysis does not take into account that the probability of the worst-case values is usually very small; thus, the resulting estimates are overconservative, leading to unnecessary over-design and under-performance of circuits. If we knew the

exact

probability distributions of the corresponding parameters, then we could use Monte-Carlo simulations (or the corresponding analytical techniques) to get the desired estimates. In practice, however, we only have

partial

information about the corresponding distributions, and we want to produce estimates that are valid for all distributions which are consistent with this information.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Applications to Mechanical Engineering: Failure Analysis under Interval and Fuzzy Uncertainty

One of the main objective of mechanics of materials is to predict when the material experiences fracture (fails), and to prevent this failure. With this objective in mind, it is desirable to use it ductile materials, i.e., materials which can sustain large deformations without failure. Von Mises criterion enables us to predict the failure of such ductile materials. To apply this criterion, we need to know the exact stresses applied at different directions. In practice, we only know these stresses with interval or fuzzy uncertainty. In this chapter, we describe how we can apply this criterion under such uncertainty, and how to make this application computationally efficient.

Its main results first appeared in [358].

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Applications to Geophysics: Inverse Problem

In many real-life situations, we have several types of uncertainty: measurement uncertainty can lead to probabilistic and/or interval uncertainty, expert estimates come with interval and/or fuzzy uncertainty, etc. In many situations, in addition to measurement uncertainty, we have prior knowledge coming from prior data processing and/or prior knowledge coming from prior interval constraints.

In this chapter, on the example of the seismic inverse problem, we show how to combine these different types of uncertainty.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Need to Go Beyond Interval and Fuzzy Uncertainty

Types

of

uncertainty

that

we

analyzed

so

far

. In the previous chapters, we described the uncertainty of inputs – and the resulting uncertainty in the values of the statistical characteristics and, more generally, the result of data processing. To characterize this uncertainty, we used the following three types of information.

First, we used the information about which values are possible and which values are not possible. In general, such an information can be described by a set. Since most real-world processes are continuous, the set of all possible values is usually connected. In the 1-D case, this means that we have an

interval

. In the multi-D case, if we have interval bounds on each of the variables, we have a box.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Beyond Interval Uncertainty: Taking Constraints into Account

For

set

information, in addition to the interval bounds on each variables

x

1

,...,

x

n

, we may have additional information: e.g., we may know that the actual values should satisfy a constraint

g

(

x

1

,...,

x

n

) ≤

g

0

. As we have mentioned earlier, usually, we know the approximate values of

x

i

, so we can safely replace the function

g

(

x

1

,...,

x

n

) by, e.g., the first two terms in its Taylor expansion. In this case, the constraint becomes quadratic, and – in a realistic case when this constraint describes a bounded set – the set of all the tuples

x

= (

x

1

,...,

x

n

) that satisfy this constraint forms an

ellipsoid

. In this case, in addition to knowing that the actual tuple

x

belongs to the box, we also know that it belongs to the ellipsoid – i.e., that the set of possible values of this tuple is an

intersection

of the box and the ellipsoid. Such a situation is analyzed in this chapter.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Beyond Interval Uncertainty: Case of Discontinuous Processes (Phase Transitions)

One of the main tasks of science and engineering is to use the current values of the physical quantities for predicting the future values of the desired quantities. Due to the (inevitable) measurement inaccuracy, we usually know the current values of the physical quantities with interval uncertainty. Traditionally, it is assumed that all the processes are continuous; as a result, the range of possible values of the future quantities is also known with interval uncertainty. However, in many practical situations (such as phase transitions), the dependence of the future values on the current ones becomes discontinuous. We show that in such cases, initial interval uncertainties can lead to arbitrary bounded closed ranges of possible values of the future quantities. We also show that the possibility of such a discontinuity may drastically increase the computational complexity of the corresponding range prediction problem.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Beyond Interval Uncertainty in Describing Statistical Characteristics: Case of Smooth Distributions and Info-Gap Decision Theory

In the traditional statistical approach, we assume that we know the exact cumulative distribution function (CDF)

F

(

x

). In practice, we often only know the envelopes [

$\underline{F}(x), \overline{F}(x)$

] bounding this CDF, i.e., we know the intervalvalued “p-box” which contains

F

(

x

). P-boxes have been successfully applied to many practical applications. In the p-box approach, we assume that the actual CDF can be any CDF

$F(x) \epsilon [\underline{F}(x), \overline{F}(x)$

]. In many practical situations, however, we know that the actual distribution is smooth. In such situations, we may wish our model to further restrict the set of CDFs by requiring them to share smoothness (and similar) properties with the bounding envelopes

$\underline{F}(x)$

and

$\overline{F}(x)$

. In previous work, ideas from Info-Gap Decision Theory were used to propose heuristic methods for selecting such distributions. In this chapter, we provide justifications for this heuristic approach.

The main results of this chapter first appeared in [38].

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Beyond Traditional Interval Uncertainty in Describing Statistical Characteristics: Case of Interval Bounds on the Probability Density Function

In the previous chapter, we considered the case when, in addition to knowing the interval bounds on the cumulative distribution function

F

(

x

) (= p-box), we also know that the function

F

(

x

) is smooth. In addition to knowing that

F

(

x

) is smooth – i.e., that its derivative

F

′(

x

) (= a probability density function) is bounded – we sometimes also know the bounds on

F

′(

x

). Such a situation is analyzed in this chapter.

We show that in this situations, the exact range of some statistical characteristics can be efficiently computed. Surprisingly, for some other characteristics, similar statistical problems which are efficiently solvable for interval-valued cdf become computationally difficult (NP-hard) for interval-valued pdf.

The results of this chapter have previously appeared in [354].

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Beyond Interval Uncertainty in Describing Statistical Characteristics: Case of Normal Distributions

In the previous two chapters, we considered the case when, in addition to the bounds on the cumulative distribution function

F

(

x

F

(

x

) – e.g., we know that

F

(

x

) is smooth (differentiable), and we know the bounds on the derivative of

F

(

x

F

(

x

); we may even know the analytical expression for

F

(

x

) – with the parameters which are only known with uncertainty. For example, in practice, when the observed signal is caused by a joint effect of many small components, it is reasonable to assume that the distribution is normal – but the parameters of this normal distribution are only known with uncertainty. Such a situation is analyzed in this chapter.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Beyond Traditional Fuzzy Uncertainty: Interval-Valued Fuzzy Techniques

For

fuzzy

information, we assumed that we have exact numerical degrees describing expert uncertainty. This is, of course, a simplifying assumption. In practice, an expert can, at best, provide bounds (i.e., an interval) or his or her degree of certainty – or even produce a

fuzzy

degree of certainty (such as “about 0.6”). Situations with interval-valued fuzzy degrees are analyzed in this chapter, and the situations with more general fuzzy-valued degrees (called

type

2) are analyzed in the next chapter.

Intervals

are

necessary

to

describe

degrees

of

belief

. In the previous text, we described an idealized situation, in which we can describe degrees of belief by exact real numbers. In practice, the situation is more complicated, because experts cannot describe their degrees of belief precisely; see, e.g., [251] and references therein.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Beyond Traditional Fuzzy Uncertainty: Type-2 Fuzzy Techniques

For

fuzzy

information, we assumed that we have exact numerical degrees describing expert uncertainty. As we have mentioned in the previous chapter, in practice, an expert can, at best, provide bounds (i.e., an interval) or his or her degree of certainty – or even produce a “type-2”

fuzzy

degree of certainty (such as “about 0.6”).

As we have shown in Part I, processing of data under general type-1 fuzzy uncertainty can be reduced to the simplest case – of interval uncertainty: namely, Zadeh’s extension principle is equivalent to level-by-level interval computations applied to

α

-cuts of the corresponding fuzzy numbers.

Hung T. Nguyen, Vladik Kreinovich, Berlin Wu, Gang Xiang

### Backmatter

Weitere Informationen

## BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.

## Whitepaper

- ANZEIGE -

### Grundlagen zu 3D-Druck, Produktionssystemen und Lean Production

Lesen Sie in diesem ausgewählten Buchkapitel alles über den 3D-Druck im Hinblick auf Begriffe, Funktionsweise, Anwendungsbereiche sowie Nutzen und Grenzen additiver Fertigungsverfahren. Eigenschaften eines schlanken Produktionssystems sowie der Aspekt der „Schlankheit“ werden ebenso beleuchtet wie die Prinzipien und Methoden der Lean Production.