Comparative measures of aggregated uncertainty representations

Elmore, Paul; Petry, Fred; Yager, Ronald

doi:10.1007/s12652-014-0228-9

Comparative measures of aggregated uncertainty representations

Original Research
Open access
Published: 16 March 2014

Volume 5, pages 809–819, (2014)
Cite this article

Download PDF

You have full access to this open access article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Comparative measures of aggregated uncertainty representations

Download PDF

Paul Elmore¹,
Fred Petry¹ &
Ronald Yager²

1225 Accesses
17 Citations
Explore all metrics

Abstract

Uncertainty must be taken into account in all aspects of ambient intelligence and human decisions and activities. We investigate how to utilize both probabilistic and possibilistic sources of information for use in humanized decision-making. In particular we examine aspects of the possibilistic conditioning of probability developed by Yager. To provide bounding of the resulting probability analysis of the cases of completely certain and uncertain probability and possibility distribution are carried out. Additionally the cases of intermediate uncertainty and a general case of possibilities are analyzed. The Zadeh consistency measure is also used to assess these cases. To consider whether the conditioned probability is more informative for decision-making, three measures, Shannon entropy, Gini index and Renyi entropy are used to compare the original probability distributions and the conditioned distribution for the cases described.

Uncertainty Management: Probability, Possibility, Entropy, and Other Paradigms

Representations of Uncertainty in Artificial Intelligence: Probability and Possibility

Fuzzy entropy functions based on perceived uncertainty

Article 02 August 2022

Manish Aggarwal

1 Introduction

Uncertainty is pervasive in the ordinary, everyday activities and decisions of humans. Fuzzy set techniques have been widely recognized for dealing with uncertainty in ambient intelligence (Acampora and Loia 2008) and human-centric systems (Pedrycz 2010). In this paper we are interested in a deeper understanding of such uncertainties and how they can be quantified for human decision makers.

One aspect that must be considered in particular is how to deal with the inherent uncertainty involved when information is aggregated in order to become useful for decision making. Effective decision-making should be able to make use of all the available, relevant information about such aggregated uncertainty. In this paper we investigate quantitative measures that can be used to guide the use of aggregated uncertainty. While there are a number of possible approaches to aggregate the uncertainty information that has been gathered, this paper will examine uncertainty aggregation by the soft computing approach of possibilistic conditioning of probability distribution representations using the approach of Yager (2012). This form of aggregation makes it very amenable to apply the information measures we consider in this paper.

To formalize the problem, let V be a discrete variable taking values in a space X that has both aleatory and epistemic sources of uncertainty (Parsons 2001). Let there be a probability distribution P: X → [0, 1] such that p_i ∈ [0, 1], : $ \sum\nolimits_{{{\text{i}} = 1}}^{\text{n}} {{\text{p}}_{\text{i}} } = \, 1 $ that models the aleatory uncertainty. Then the epistemic uncertainty can be modeled by a possibility distribution (Zadeh 1978) such that Π : X → [0, 1], where π(x_i) gives the possibility that x_i is the value of V, $ {\text{i}} = 1, \, 2, \ldots ,{\text{n}} $. A usual requirement here is the normality condition, $ \mathop {\text{Max}}\nolimits_{\text{x}} $ [π (x)] = 1, that is at least one element in X must be a fully possible. Abbreviating our notation so that p_i = p(x_i), etc. and π_i = π(x_i), etc., we have P = {p₁, p₂,…p_n} and Π = {π₁, π₂,…, π_n}.

In possibilistic conditioning, a function f dependent on both P and Π is used to find a new conditioned probability distribution such that

$$ {\text{f }}({\text{P}},\varPi ) \Rightarrow {\text{new}}\;{\hat{\text{P}}} $$

where $ {\hat{\text{P}}} = \left\{ {{\hat{\text{p}}}_{1} ,{\hat{\text{p}}}_{2} , \ldots ,{\hat{\text{p}}}_{\text{n}} } \right\} $ with

$$ {\hat{\text{p}}}_{\text{i}} = {\text{p}}_{\text{i}} \pi_{\text{i}} / {\text{K}};\;{\text{K}} = \sum\limits_{{{\text{i}} = 1}}^{\text{n}} {{\text{p}}_{\text{i}} \pi_{\text{i}} } $$

(1)

A strength of this approach using conditioned probability is that it also captures Zadeh’s concept of consistency between the possibility and the original probability distribution. Consistency provides an intuition of concurrence between the possibility and probability distributions being aggregated. In Eq. (1), K is identical to Zadeh’s possibility-probability consistency measure (Zadeh 1978), C_Z (Π, P); i.e. C_Z (Π, P) = K.

As an example of a conditioned probability distribution that could be used to provide guidance to a decision maker, consider the following military problem. Over the first decade of the 21st century, a major cause of casualties in both Iraq and Afghanistan combat zones has been from improvised explosive devices (IEDs). Prevention/avoidance of IED attacks is a critical decision and should be based on assessment of the most probable placements of IEDs (Benigni and Furrer 2012). One approach is to consider historical probability distributions characterizing typical placements sites. Let the placement sites considered be X1, X2, X3, and X4. The variable V_IED takes values from the space X = {X1, X2, X3, X4}. For this example, let the probability distribution for past IED placements be denoted as p(V_IEDhistoric) ≡ P_IED. So we have the distribution

$$ {\text{P}}_{\text{IED}} = \left\{ {\frac{{{\text{X}}1}}{0.3},\frac{{{\text{X}}2}}{0.2},\frac{{{\text{X}}3}}{0.4},\frac{{{\text{X}}4}}{0.1}} \right\}, $$

where the upper halves indicate locations and the lower the corresponding probabilities.

Typically there may be additional or more current information based on intelligence reports that are subjective in nature. A possibility distribution could be used to represent such subjective information. If the intelligence officials provide their assessment the possibility distribution for this will be denoted as Π (V_{IEDintelligence}) ≡ Π_IED and we have

$$ \varPi_{\text{IED}} = \left\{ {\frac{{{\text{X}}1}}{1},\frac{{{\text{X}}2}}{0.6},\frac{{{\text{X}}3}}{0.8},\frac{{{\text{X}}4}}{0.2},} \right\} $$

We can now combine these by the possibilistic conditioning approach. Using Eq. (1) we have first

$$ {\text{K}} = \, 0.3 \times 1 \, + \, 0.2 \times 0.6 \, + \, 0.4 \times 0.8 \, + \, 0.1 \times 0.2 \, = \, 0.76, $$

Then,

$$ {\hat{\text{p}}}_{1} = 0.3 \times 1/0.76 = 0.39; \ldots ;{\hat{\text{p}}}_{4} = 0.1 \times 0.2/0.76 = 0.03 $$

The conditioned probability distribution for IED locations is then

$$ {\hat{\text{P}}}_{\text{IED}} = \left\{ {\frac{{{\text{X}}1}}{0.39},\frac{{{\text{X}}2}}{0.16},\frac{{{\text{X}}3}}{0.42},\frac{{{\text{X}}4}}{0.03}} \right\}. $$

The issue at hand is does $ {\hat{\text{P}}}_{\text{IED}} $ represent an improved estimate of the IED locations? In order to provide intuition and tools to assess this question, this paper will provide the following discussions, reusing the IED distributions above as an ongoing numerical example. Section 2 begins by providing theorems for extreme case of Π, one of absolute certainty and the other of complete uncertainty. These theorems provide simplifications, check results and characterize the approach. The section then continues with combination of two more general Π distributions with for four different classes of Π distributions. In Sect. 3, we assess the utility of an aggregated uncertainty and to decide if this aggregation provides more effective information through consideration of information measures; including Shannon entropy, Gini index, and Renyi entropy; to gauge the aggregated uncertainty. For our on going IED numerical example, the Shannon entropy (Reza 1961),

$$ {\text{S(P)}} = - \sum\limits_{{{\text{i}} = 1}}^{\text{n}} {{\text{p}}_{\text{i}} 1 {\text{n}}\left( {{\text{p}}_{\text{i}} } \right),} $$

(2)

yields for P_IED and $ {\hat{\text{P}}}_{\text{IED}} $

$$ {\text{S}}\left( {{\hat{\text{P}}}_{\text{IED}} } \right) = 1.13 < {\text{S}}\left( {{\text{P}}_{\text{IED}} } \right) = 1.28 $$

(3)

These measures for the more generalized analytic cases are presented here. Section 4 then discusses consistency and shows that it provides an additional measure that is compatible with the information measures from the previous section. The paper then provides a summary and discussion of future research in Sect. 5.

2 Aggregation of possibility and probability by conditioning

To examine the conditioning approach further we formulate four distinct cases for the possibility distributions. The first two, complete certainty, complete uncertainty, represent the extreme cases of possibility distributions. Then two intermediate cases, partial certainty and a generalized certainty case will be discussed. For each case we provide instantiations of the cases based on the two extreme probability distributions, completely certain, P_cc, and completely uncertain, P_cu. These cases are shown in Table 1. Additional measures discussed in Sects. 3 and 4 provide guidance for use of the result.

Table 1 Possibility and probability distribution cases

Full size table

2.1 Case 1: complete certainty

A possibility distribution with exactly one possibility value equal to 1 and all other values equal 0 represents a completely certain distribution. Now we will prove the relationship between such a distribution and the conditioned probability.

Theorem 1

If a possibility distribution $ \varPi $ is completely certain, then its conditioned probability $ {\hat{\text{P}}} $ is completely certain.

Proof

$ \varPi $ is completely certain if $ \exists $ k such that $ {\rm{\pi}_{k}} = 1,{\rm and}\;{\rm{\pi}_{i}} = 0,\forall \;{\rm i \ne {k}}. $ To obtain the conditioned probability we first calculate K using Eq. (1):

$$ {\text{K}} = {\text{p}}_{\text{k}} \pi_{\text{k}} + \sum\limits_{{{\text{i}} \ne {\text{k}}}}^{\text{n}} {{\text{p}}_{\text{i}} \pi_{\text{i}} = {\text{p}}_{\text{k}} \times 1} + \sum\limits_{{{\text{i}} \ne {\text{k}}}}^{\text{n}} {{\text{p}}_{\text{i}} \times 0 = {\text{p}}_{\text{k}} } $$

So now we find the conditioned probabilities

$$ \begin{aligned} {\hat{\text{p}}}_{\text{k}} =& {\text{p}}_{\text{k}} \pi_{\text{k}} /{\text{K}} = {\text{p}}_{\text{k}} \times 1/{\text{p}}_{\text{k}} = 1 \hfill \\ {\hat{\text{p}}}_{\text{i}} =& {\text{p}}_{\text{i}} \pi_{\text{i}} /{\text{K}} = {\text{p}}_{\text{i}} \times 0/{\text{p}}_{\text{k}} = 0,\;{\text{i}} \ne {\text{k}} \hfill \\ \end{aligned} $$

Thus the conditioned probability distribution $ {\hat{\text{P}}} $is

$$ {\hat{\text{P}}} = \left\{ {0, \ldots ,{\hat{\text{p}}}_{\text{k}} = 1, \ldots 0} \right\} $$

which is a completely certain probability distribution. □

Some of the issues relative to the interpretation of this result with respect to consistency and conflict will be discussed in Sect. 4.

2.2 Case 2: complete uncertainty

If there is no distinction that is made on the values of the variable V by the possibility distribution, we say this implies complete uncertainty. This is then represented in the distribution by all values equaling 1 as shown in Table 1.

Theorem 2

If a possibility distribution $ \varPi $ is completely uncertain, then its conditioned probability $ {\hat{\text{P}}} $ is identical to the original probability P.

Proof

$ \varPi $ is completely uncertain if $ \forall {\text{i}},\pi_{\text{i}} = 1. $ To obtain the conditioned probability we first calculate K using Eq. (1)

$$ {\text{K}} = \sum\limits_{{{\text{i}} = 1}}^{\text{n}} {{\text{p}}_{\text{i}} \pi_{\text{i}} } = + \sum\limits_{{{\text{i}} \ne {\text{k}}}}^{\text{n}} {{\text{p}}_{\text{i}} \times 1 = 1} $$

since $ \sum\nolimits_{{{\text{i}} = 1}}^{\text{n}} {{\text{p}}_{\text{i}} } = 1 $ for any probability distribution.So now we find the conditioned probabilities

$$ {\hat{\text{P}}}_{\text{i}} = {\text{p}}_{\text{i}} *\pi_{\text{i}} /{\text{K}} = {\text{p}}_{\text{i}} *1/1 = {\text{p}}_{\text{i}} $$

Thus the conditioned probability distribution $ {\hat{\text{P}}} $ is

$$ {\hat{\text{P}}} = \left\{ {{\text{p}}_{1} ,{\text{p}}_{2} , \ldots {\text{p}}_{\text{n}} } \right\} = {\text{P}}, $$

which is the original probability distribution. □

The interpretation of this result is that the possibility distribution shows no preference for any specific value and so the default is that the information to be used in a decision should be that represented by the original probability distribution. So for the two extreme probability cases (Table 1) we have respectively:

(a)
$$ {\text{P}}_{\text{cc}} : = {\hat{\text{P}}}\left\{ { \, 0, \, 0, \ldots ,0,{\text{ p}}_{\text{t}} = 1, \ldots 0} \right\} $$
(b)
$$ {\text{P}}_{\text{cu}} :{\hat{\text{P}}} = \left\{ {\frac{1}{\text{n}},\frac{1}{\text{n}}, \ldots \frac{1}{\text{n}}} \right\} $$

Clearly these are valid as for both distributions: $ \sum\nolimits_{{\text{i} = 1}}^{\text{n} } {\hat{\text{P}}}_{\text{i} } = 1. $

2.3 Case 3: intermediate uncertainty

Here we examine the case that falls between complete certainty and complete uncertainty for a possibility distribution. To represent this we allow m values of Π = 1, such that 1 < m < n. For convenience we index these values from i = 1, so we have for the distribution:

$$ \varPi = \left\{ {1, \, 1, \ldots 1, \, 0, \, 0 \ldots 0} \right\}:\pi_{\text{i}} = 1;{\text{ i}} = 1 \ldots {\text{m}},\pi_{\text{j}} = 0;{\text{ j}} = {\text{m}} + 1 \ldots {\text{n}} $$

Then clearly K = p₁ + p₂ + ··· + p_m and

$$ {\hat{\text{p}}}_{\text{i}} = {\text{p}}_{\text{i}} \times 1/ \left( {{\text{p}}_{ 1} + {\text{p}}_{ 2} + \cdots + {\text{ p}}_{\text{m}} } \right);{\text{i }} = 1\ldots {\text{m}}; \,{\hat{\text{p}}}_{\text{m + 1}} = \ldots {\hat{\text{p}}}_{\text{n}} = \, 0 $$

In order to understand what happens in this intermediate uncertainty situation, we will examine the two extreme probability distributions being conditioned by this possibility. First for P_cc we have to consider two subcases.

2.3.1 P_cc, Subcase (1); p_t = 1; t ≤ m

$$ {\text{K}} = 0 \times 1 + 0 \times 1 + \cdots + \left( {{\text{p}}_{\text{t}} = 1} \right) \times (\pi_{\text{t}} = 1) + \cdots 0 \times 0 = 1 $$

$$ {\hat{\text{p}}}_{\text{t} } = \frac{1 \times 1}{\text{K} } = \frac{1}{1} = 1;\,{\hat{\text{p}}}_{{\text{j} \ne \text{t} }} = 0 $$

$$ {\text{So}} \,\,\,\,{\hat{\text{P}}} = \{ 0, \, 0, \ldots ,0,\,{\hat{\text{p}}}_{\text{t} } = 1, \ldots 0\} = {\text{P}}_{\text{cc}} $$

2.3.2 P_cc Subcase (2); p_t = 1; t > m

For this subcase, however, there is a problem since π_t = 0, but p_t = 1. This case will be discussed further in Sect. 4.

2.3.3 P_cu, Complete uncertainty

Next for the completely uncertain probability distribution P_cu we find

$$ {\text{K}} = \sum\limits_{{{\text{i}} = 1}}^{\text{m}} {{\text{p}}_{\text{i}} \pi_{\text{i}} } + \sum\limits_{{{\text{i}} = {\text{m}} + 1}}^{\text{n}} {{\text{p}}_{\text{i}} \pi_{\text{i}} } = {\text{ m}} \times \left( {\frac{1}{\text{n}}} \right) \times 1 + \left( {{\text{n}} - \left( {{\text{m}} + 1} \right)} \right) \times \left( {\frac{1}{\text{n}}} \right) \times 0 = \frac{\text{m}}{\text{n}} $$

Then the conditioned probability values are

$$ {\hat{\text{p}}}_{\text{i}} = \frac{1}{\text{n}} \times 1/\left( {\frac{\text{m}}{\text{n}}} \right) = \frac{1}{\text{m}};\;{\text{i}} = 1 \ldots {\text{m}} $$

$$ {\hat{\text{p}}}_{\text{m + 1}} = \ldots {\hat{\text{p}}}_{\text{n}} = \frac{1}{\text{n}} \times 0/\left( {\frac{\text{m}}{\text{n}}} \right) = 0 $$

$$ {\hat{\text{p}}} = \left\{ {\frac{ 1}{\text{m}},\frac{ 1}{\text{m}}, \ldots ,\frac{ 1}{\text{m}},{\hat{\text{p}}}_{\text{m + 1}} = 0,0, \ldots 0} \right\} $$

Therefore, we have obtained a subset of equally distributed conditioned probabilities corresponding to the possibilities that are 1. Note that these equally distributed probabilities are greater than the $ \frac{1}{\text{n}} $ values for the initial P_cu. Again, this is clearly a valid distribution as $ \sum\nolimits_{{{\text{i}} = 1}}^{\text{n}} {{\hat{\text{p}}}_{\text{i}} } = {\text{m}} \times \left( {\frac{1}{\text{m}}} \right) = \left( {{\text{n}} - \left( {{\text{m}} + 1} \right)} \right) \times 0 = 1 $

2.4 Case 4: Generalized possibility distribution

This is a general case for which we index π₁ = 1 and to capture the situation between complete certainty and uncertainty we use the weights, 0 < w_i < 1, for the n − 1 arbitrary possibility values. So from Table 1 this possibility distribution is:

$$ \varPi = \{ 1,{\text{w}}_{ 2} ,{\text{w}}_{ 3} , \ldots ,{\text{w}}_{\text{n}} \} $$

and for the conditioned probabilities we obtain

$$ {\text{K}} = {\text{p}}_{1} \times 1 + \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{p}}_{\text{i}} {\text{w}}_{\text{i}} } = {\text{p}}_{1} + {\text{K}}^{\prime },\;{\text{where}}\;{\text{K}}^{\prime } = \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{p}}_{\text{i}} {\text{w}}_{\text{i}} } $$

$$ {\hat{\text{p}}}_{1} = {\text{ p}}_{1} \times 1 + (p_{1} {\text{ + K}}^{\prime });\quad {\hat{\text{p}}}_{\text{i}} = {\text{ p}}_{\text{i}} \times {\text{w}}_{\text{i}} + (p_{1} {\text{ + K}}^{\prime });\;{\text{i}} = 2 \ldots {\text{n}} $$

Again we will examine the conditioning of the extreme probabilities and once more have to consider the subcases of P_cc

2.4.1 P_cc subcase (1); t = 1, p₁ = 1

$$ {\text{K}}^{\prime } = \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {0 \times {\text{w}}_{\text{i}} = 0;} $$

$$ {\hat{\text{p}}}_{ 1} = 1 \times 1/(1 + 0) = 1;\;{\hat{\text{p}}}_{\text{i}} = 0 \times {\text{w}}_{\text{i}} /(1 + 0) = 0;\quad {\text{i}} = 2 \ldots {\text{n}} $$

$$ {\hat{\text{P}}} = \left\{ { 1, \, 0, \ldots 0} \right\} = {\text{P}}_{\text{cc}} $$

2.4.2 P_cc subcase (2); t > 1, p_t = 1

We find the conditioned probability here as:

$$ {\text{K}} = 0 \times 1 + {\text{p}}_{\text{t}} \times {\text{w}}_{\text{t}} + \sum\limits_{{{\text{i}} = 2,{\text{i}} \ne {\text{t}}}}^{\text{n}} {0 \times {\text{w}}_{\text{i}} = {\text{w}}_{\text{t}} } $$

$$ {\hat{\text{p}}}_{1} = 0 \times 1/ {\text{w}}_{\text{t}} = 0; = {\hat{\text{p}}}_{\text{t}} = {\text{p}}_{\text{t}} \times {\text{w}}_{\text{t}} / {\text{w}}_{\text{t}} = 1\times {\text{w}}_{\text{t}} / {\text{w}}_{\text{t}} = 1 $$

$$ {\hat{\text{p}}}_{\text{i}} = 0 \times {\text{w}}_{\text{i}} /{\text{w}}_{\text{t}} = 0;{\text{ i}} = 2\ldots {\text{n}},{\text{ i}} \ne {\text{t}} $$

$$ {\hat{\text{P}}} = \{ 0, \, 0, \ldots ,{\hat{\text{p}}}_{\text{t}} = 1, \ldots 0\} = {\text{P}}_{\text{cc}} $$

2.4.3 P_cu complete uncertainty

Finally for the completely uncertain probability P_cu

$$ {\text{K}} = \frac{1}{\text{n}} + \sum\limits_{{{\text{i}} = 2}}^{\text{n}} { 1/{\text{n}} \times {\text{w}}_{\text{i}} } = \left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} { {\text{w}}_{\text{i}} } } \right)/{\text{n}} $$

$$ {\hat{\text{p}}}_{1} = \frac{1}{\text{n}} \times 1/\left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right)/{\text{n}} = 1/\left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right) $$

$$ {\hat{\text{p}}}_{\text{i}} = \frac{1}{\text{n}} \times {\text{w}}_{\text{i}} /\left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right)/{\text{n}} = {\text{w}}_{\text{i}} /\left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right)\;\;{\text{i }} = { 2} \ldots {\text{n}} $$

Here we can see since 0 < w_i < 1, then $ {\hat{\text{p}}}_{ 1} $ < 1 and $ {\hat{\text{p}}}_{\text{i}} $ < $ {\hat{\text{p}}}_{ 1}. $ Also these conditioned probabilities still sum to 1.

$$ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\hat{\text{p}}}_{\text{i}} } = 1/\left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right) + \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } /\left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right) = \left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right)/\left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right) = 1 $$

To consider specific cases for the weights here we look at an equal distribution of the weights values. In a sense this is a default choice. After deciding on which possibility value to choose as 1, then if there is no preference for the other values, a default of equal values is reasonable. We first consider that since there are n-1 weights to be assigned we use the weight values as

$$ {\text{w}}_{1} = 1{\text{ and w}}_{\text{i}} = \frac{1}{{{\text{n}} - 1}},\;{\text{i}} = 2, \ldots ,{\text{n}}. $$

First

$$ \left( {1 + \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right) = \left( {1 \, + \, \left( {{\text{n}} - 1} \right) \times \frac{1}{{{\text{n}} - 1}} \, } \right) = 2 $$

Then

$$ {\hat{\text{p}}}_{1} = 1/\left( {1 + \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right) = \frac{ 1}{2};\;{\text{and}}\;{\hat{\text{p}}}_{\text{i}} = {\text{w}}_{\text{i}} /\left( {1 + \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right) = \left( {\frac{1}{{{\text{n}} - 1}}} \right)/2 = \frac{ 1}{{2({\text{n}} - 1)}} $$

So ${\hat{\text{P}}} = \left\{ {\frac{1}{2},\frac{1}{{2({\text{n}} - 1)}},\frac{1}{{2({\text{n}} - 1)}}, \cdots \frac{1}{{2({\text{n}} - 1)}}} \right\} $ and clearly this is a valid distribution as

$$ \sum\limits_{{{\text{i}} = 1}}^{\text{n}} {{\hat{\text{p}}}_{\text{i}} = \frac{1}{2}} \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {\frac{1}{{2({\text{n}} - 1)}} = \frac{1}{2} + ({\text{n}} - 1)} \frac{1}{{2({\text{n}} - 1)}} = \frac{1}{2} + \frac{1}{2} = 1 $$

3 Information measures

In this section we will consider measures that can be used to evaluate a conditioned probability distribution relative to the original probability. Shannon’s entropy has been a commonly accepted standard for information metrics; however, the concept of information is so rich and broad that multiple approaches to the quantification of information are desirable (Klir 2006; Xu and Erdogmuns 2010). Thus, we will also examine other measures, such as the Gini index and Renyi entropy, in this section.

3.1 Shannon entropy

Shannon entropy has been the most broadly applied measure of randomness or information content (Shannon 1948). For a probability distribution P = {p₁, p₂,…p_n} as was discussed previously in Eq. (2), $ {\text{S}}({\text{P}}) = - \sum\nolimits_{{{\text{i}} = 1}}^{\text{n}} {{\text{p}}_{\text{i}} } 1 {\text{n}}({\text{p}}_{\text{i}} ) $. The well-known minimum and maximum values for the Shannon entropy are presented in the context of our two extreme probability cases.

First for complete certainty, P_cc, we recall that here, for some t, p_t = 1, and so

$$ {\text{S}}\left( {{\text{P}}_{\text{cc}} } \right) = - \left( { 1 {\text{ ln (1)}} + \sum\limits_{{{\text{i}} = 1, \ne {\text{t}}}}^{n} {0{\text{ ln (}}0 )} } \right) = 1\times 0 + \sum\limits_{{{\text{i}} = 1, \ne {\text{t}}}}^{\text{n}} 0 = 0 $$

Note this follows as $ \mathop {\lim }\nolimits_{{{\text{p}} \to 0 +, }} $p ln p = 0. That is when a probability distribution represents complete certainty, then we have no uncertainty, i.e. maximum information.

Then for the case of complete uncertainty represented by the equi-probable distribution, P_cu, where ∀ i p_i = 1/n

$$ \text{S}\left( {\text{P}_{{\text{cu}}} } \right) = -\sum\limits_{{\text{i = 1}}}^{\text{n}} {\frac{1}{\text{n}}} \times \text{ln }\left( {\frac{1}{\text{n}}} \right) = -\frac{1}{\text{n}}\sum\limits_{{\text{i = 1}}}^{\text{n}} {\left( {\text{ln (1)}-\text{ln (n)}} \right) = -\text{n} \times \frac{1}{\text{n}}\left( {0 - \text{ln (n)}} \right)\text{ = ln (n)}.} $$

That is, when all probabilities are equi-probable, this is the most unpredictable, uncertain situation and so represents the minimum information. In summary the range of Shannon’s entropy for a given probability distribution is:

$$ 0 \le {\text{S(P)}} \le {\text{ln (n)}} $$

(4)

3.2 Gini Index

The Gini index, G(P), also known as the Gini coefficient, is a measure of statistical dispersion developed by Gini (1912), and is defined as

$$ {\text{G(P)}} \equiv 1- \sum\limits_{{\text{i = 1}}}^{\text{n}} {{\text{p}}_{\text{i}}^{2} } $$

(5)

Some practitioners use G(P) versus S(P) since it does not involve a logarithm, making analytic solutions simpler. Gini index is used in consideration of inequalities in various areas such as economics, ecology and engineering (Aristondo et al. 2012). A very important application of the Gini index is as a splitting criterion for decision tree induction in machine learning and data mining (Breiman et al. 1984).

It is accepted in practice for diagnostic test selection that the Shannon and Gini measures are interchangeable (Sent and van de Gaag 2007). The specific relationship of Shannon entropy and the Gini index has been discussed in the literature (Eliazar and Sokolov 2010). Theoretical support for this practice is provided in Yager’s independent consideration of alternative measures of entropy (Yager 1995), where he derives the same form for an entropy measure as the Gini measure.

Now as done for Shannon entropy, we consider the maximum and minimum values for G(P). Letting $ \text{R = }\sum\nolimits_{{\text{i = 1}}}^{\text{n}} {{\text{p}}_{\text{i}}^{2} } $, then since 0 ≤ p_i ≤ 1 (0 ≤ p ²_i ≤ 1) and at least one p_i > 0, 0 < R ≤ 1. R = 1 only if for some t, p_t = 1. Thus, G(P) > 0 unless p_t = 1 where G(P) = 0. This is the case for the distribution P_cc, since p_t = 1, p_i = 0, i ≠ t. Specifically

$$ {\text{G}}\left( {{\text{P}}_{\text{cc}} } \right) = 1- ({\text{p}}_{\text{t}}^{2} + \sum\limits_{{\text{i} \ne \text{t}}}^{\text{n}} {{\text{p}}_{\text{i}}^{2} ) = 1- \left( { 1^{ 2} + 0} \right) = 0} $$

As for the Shannon entropy this corresponds to no uncertainty and has the same value of 0.

Next we examine the index for the equi-probable distribution, P_cu, where p_i = 1/n for all i.

$$ {\text{G}}\left( {{\text{P}}_{\text{cu}} } \right) = 1- \sum\limits_{{\text{i = 1}}}^{\text{n}} {\frac{1}{{\text{(n)}^{ 2} }} = 1- {\text{n}}\frac{1}{{\text{(n}^{\text{2}} )}} = 1- \frac{1}{\text{n}} = \frac{{\text{n}- 1}}{\text{n}}} $$

Consider the behavior of G(P_cu) as n increases. For $ {\text{n}} = 2\left( {{\text{p}}_{\text{i}} = \frac{1}{2};\,{\text{p}}_{\text{j}} = \frac{1}{2}} \right) $

$$ {\text{G}}\left( {{\text{P}}_{\text{cu}} } \right) = \frac{2 - 1}{2} = 1/ 2 $$

Then for n = 10 (p_i = 0.1,…, p_j = 0.1) we have

$$ {\text{G}}\left( {{\text{P}}_{\text{cu}} } \right) = \frac{10 - 1}{10} = 9/ 10 $$

Clearly then for n → ∞, G(P) → 1. Thus, in the case of an equiprobable distribution, we have increasing values for G(P_cu) with n, and in general the range for G(P) is

$$ 0 \le {\text{G}}\left( {\text{P}} \right) \le \frac{{\text{n} - 1}}{\text{n}} < 1 $$

(6)

Now we can use this measure for evaluating our IEDs’ example and compare G(P) for the original and the conditioned probability distributions. First

$$ {\text{G}}\left( {{\text{P}}_{\text{IED}} } \right) = 1- \left( {0. 3^{ 2} + 0. 2^{ 2} + 0. 4^{ 2} + 0. 1^{ 2} } \right) = 1- 0. 3= 0. 7 $$

So for $\hat{\rm{P}} $

$$ {\text{G}}(\hat{\rm{P}}_{\text{IED}} ) = 1- \left( {0. 3 9^{ 2} + 0. 1 6^{ 2} + 0. 4 2^{ 2} + 0.0 3^{ 2} } \right) = 1- 0. 3 5 4= 0. 6 4 6 $$

Thus we see that as with the Shannon measure result (3), based on the Gini index $ \hat{\rm{P}}_{{_{\text{IED}} }} $ appears again to be more informative than P_IED.

3.3 Application of measures to the four cases

In this section we apply the Shannon and Gini measures to the original and conditioned probability distributions for the four possibility distribution cases of Sect. 2.2 and compare the measures’ values. As both measures have increasing values with increasing uncertainty, the conditioned probability will be more informative for decision-making if it’s measure value is less than for the original probability. We shall see that both measures basically agree for the cases considered although their specific values are in different ranges.

3.3.1 Case 1

For the completely certain possibility, we consider only where there is no conflict and the conditioned probability is

$$ \hat{\text{P}} = \left\{ { 1, \, 0, \ldots 0} \right\} $$

Then we have first for both measures with the distribution P_cc

$$ {\text{S}}(\hat{\text{P}}) = {\text{G}}(\hat{\text{P}}) = 0 = {\text{S}}\left( {{\text{P}}_{\text{cc}} } \right) = {\text{G}}\left( {{\text{P}}_{\text{cc}} } \right) $$

But for the equi-probable initial distribution P_cu

$$ {\text{S}}\left( {{\text{P}}_{\text{cu}} } \right) = {\text{ln(n)}} > {\text{S}}(\hat{\text{P}}) = 0 $$

$$ {\text{G}}\left( {{\text{P}}_{\text{cu}} } \right) = \frac{{\text{n} - 1}}{\text{n}} > {\text{G}}(\hat{\text{P}}) = 0 $$

So the conditioned probability distribution is more informative in the second case for the probability P_cu.

3.3.2 Case 2

Next for the case of complete possibilistic uncertainty, we had $ \hat{\text{P}} $ = P for all the probability distributions and so we have

$$ {\text{S}}(\hat{\rm{P}}) = {\text{S}}\left( {\text{P}} \right){\text{ and G}}(\hat{\rm{P}}) = {\text{G}}\left( {\text{P}} \right) $$

We can conclude that the conditioned probability distribution $ \hat{\rm{P}} $ is no more informative than the original probability P since the possibility distribution Π does not contribute any information as it represents complete uncertainty.

3.3.3 Case 3

Recall this is the intermediate possibility case and here we consider the probability, P_cc, first for the Shannon measure and then the Gini index. Since for no conflict $ \hat{\rm{P}} = \left\{ {0, \, 0, \ldots ,0,{\text{ p}}_{\text{t}} = 1, \ldots 0} \right\} $ then as before for this distribution

$$ {\text{S}}(\hat{\text{P}}) = {\text{S}}\left( {{\text{P}}_{\text{cc}} } \right) = 0{\text{ and G}}(\hat{\text{P}}) = {\text{G}}\left( {{\text{P}}_{\text{cc}} } \right) = 0 $$

Next for the equi-probable distribution P_cu, the Shannon measure is

$$ \begin{aligned} {\text{S}}(\hat{\rm{P}}) & = - \left( {\sum\limits_{{\text{i = 1}}}^{\text{m}} {\left( {\frac{1}{\text{m}}} \right)} { \ln }\left( {\frac{1}{\text{m}}} \right) + \sum\limits_{{\text{i = m + 1}}}^{\text{n}} {0{ \ln }\left( 0 \right)} } \right) = - \frac{1}{\text{m}}\sum\limits_{{\text{i = 1}}}^{\text{m}}(\text{1n(1)} - \text{1n(m)}) \\ & = - \frac{1}{\text{m}} \times \left( { - {\text{ m ln}}\left( {\text{m}} \right)} \right) = { \ln }\left( {\text{m}} \right) \\ \end{aligned} $$

Now since P_cu is an equi-probable distribution and n > m

$$ {\text{S}}\left( {{\text{P}}_{\text{cu}} } \right) = { \ln }\left( {\text{n}} \right) > { \ln }\left( {\text{m}} \right) = {\text{S}}(\hat{\rm{P}}) $$

Next for the Gini measure

$$ {\text{G}}(\hat{\rm{P}}) = 1- \left( {\sum\limits_{{\rm{i = 1}}}^{\rm{m}} {\hat{\rm{p}}_{\rm{i}}^{2} } + \sum\limits_{{\rm{i = m + 1}}}^{\rm{m}} {\hat{\rm{p}}_{\rm{i}}^{2} } } \right) = 1- \left( {\sum\limits_{{\text{i = 1}}}^{\rm{m}} {\left( {\frac{1}{{\rm{m}^{\rm{2}} }}} \right) + } \sum\limits_{{\rm{i = m + 1}}}^{\rm{n}} 0 } \right) = 1- {\text{m}} \times \frac{1}{{\rm{m}^{\rm{2}} }} $$

Recall G(P_cu) = = 1 − 1/n and since 1 < m < n, 1/n < 1/m

$$ {\text{G}}(\hat{\rm{P}}) = 1- \frac{1}{\text{m}} < 1- \frac{1}{\text{n}} = {\text{G}}\left( {{\text{P}}_{\text{cu}} } \right) $$

Thus we see that by both measures the conditioned probability is more informative in this case.

3.3.4 Case 4

This is the case of the generalized possibility distribution in which for P_cc we saw that $ \hat{\text{P}} $ $ \hat{\text{P}} $ = P_cc. So again we have

$$ {\text{S}}(\hat{\text{P}}) = {\text{G}}(\hat{\text{P}}) = 0 = {\text{S}}\left( {{\text{P}}_{\text{cc}} } \right) = {\text{G}}\left( {{\text{P}}_{\text{cc}} } \right) $$

Next for the other probability distribution, P_cu, we had obtained for $ \hat{\text{P}} $ a general expression in terms of the weights w_i. Here we will consider the special case we examined for the equal distribution of the weights where we had for $ \hat{\text{P}} $

$$ \hat{\text{P}} = \left\{ {\frac{1}{2},\,\frac{1}{{2(\text{n} - 1)}},\,\frac{1}{{2(\text{n} - 1)}}, \ldots \frac{1}{{2(\text{n} - 1)}}} \right\} $$

Now we can apply our measures to this conditioned distribution. First for S($ \hat{\text{P}} $):

$$ \begin{aligned} {\text{S}}(\hat{\text{P}}) & = - \, \left[ {\frac{1}{2}{ \ln }\left( \frac{1}{2} \right) + \sum\limits_{{\text{i = 2}}}^{\text{n}} {\frac{1}{{2\text{(n} - 1)}}\text{In}\left( {\frac{1}{{2\text{(n} - 1)}}} \right)} } \right] \\ & = \left[ {\frac{1}{2}\left( {{\text{ln1}} - {\text{ln2}}} \right) + \left( {{\text{n}} - 1} \right)\frac{1}{{2\text{(n} - 1)}}\left( {{\text{ln1 }} - { \ln }\left( { 2\left( {{\text{n}} - 1} \right)} \right)} \right)} \right] = - \left[ { - \frac{{\text{1n2}}}{2} - \frac{1}{2}{ \ln }\left( { 2\left( {{\text{n}} - 1} \right)} \right)} \right] \\ & = \frac{1}{2} \left[ {{\text{ ln 2 }} + { \ln }\left( { 2\left( {{\text{n}} - 1} \right)} \right)} \right] \\ \end{aligned} $$

$$ {\text{For n}} = 2,{\text{ S}}(\hat{\text{P}}) = \frac{1}{2}\left[ {{\text{ ln 2 }} + { \ln }\left( { 2\left( 1\right)} \right)} \right] = {\text{ln 2}} = {\text{S}}\left( {\text{P}} \right),{\text{ but for n}} > 2 $$

$$ {\text{S}}(\hat{\text{P}}) < {\text{ln n}} = {\text{S}}\left( {\text{P}} \right) $$

Next for the Gini index:

$$ \begin{aligned} {\text{G}}(\hat{\text{P}}) & = 1- \left( {\left( \frac{1}{2} \right)^{2} + \sum\limits_{{\text{i = 1}}}^{\text{n}} {\left( {\frac{1}{{2(\text{n} - \text{1})}}} \right)^{2} } } \right) = { 1} - \left( {\frac{1}{4} + \, \left( {{\text{n}} - 1} \right) \times \left( {\frac{1}{{2(\text{n} - \text{1})}}} \right)^{2} } \right) = 1- \left( {\frac{1}{4} + \frac{1}{{4(\text{n} - \text{1})}}} \right) \\ & = \frac{{\text{3n} - \text{4}}}{{\text{3n} - \text{4}}} \\ \end{aligned} $$

Similar to the Shannon measure for n = 2, the Gini measure is the same for P and $ \hat{\text{P}} $

$$ {\text{G}}(\hat{\text{P}}) = 1- \frac{2}{4} = \frac{1}{2}\quad{\text{and G}}\left( {\text{P}} \right) = 1- \frac{1}{\text{n}} = \frac{1}{2} $$

Finally for $ {\text{n }} > 2,\quad {\text{G}}({\hat{\text{P}}) = }\frac{{\text{3n}\text{ - }\text{4}}}{{\text{4n}- \text{4}}} < {\text{G}}\left( {\text{P}} \right) = \frac{{\text{n}- \text{1}}}{\text{n}} $ since for $ {\text{n }} = 3,{\text{ G}}({\hat{\rm {P}}) = }\frac{5}{8} < {\text{G}}\left( {\text{P}} \right) = \frac{2}{3} $ , and as $ {\text{n}} \to \infty ,{\text{ G}}(\hat{\text{P}}) \to \frac{3}{4},{\text{ but G}}\left( {\text{P}} \right) \to 1. $

To consider this last case more generally, we examine the effects of the range of equi-distributed weights. When w_i → 0, 1 < i ≤ n, Π → {1, 0,…0}, complete certainty, and we recall for which case we have seen that the conditioned distribution is more informative. Then if w_i → 1, 1 < i ≤ n, Π → {1, 1,…1}, the case of complete uncertainty. So the conditioned probability distribution $ \hat{\text{P}} $ is no more informative than the original probability P for either measure.

3.4 Renyi entropy

Renyi (1961,1970) introduced a parameterized family of entropies as a generalization of Shannon entropy. The intention was to have the most general approach that preserved the additivity property and satisfied the probability axioms of Kolmogorov. Renyi entropy is

$$ {\text{S}}_{\alpha } \left( {\text{P}} \right) \, = \frac{1}{1 - \alpha } \times \text{In}\left( {\sum\limits_{{\text{i = 1}}}^{\text{n}} {\text{p}_{\text{i}}^{{\alpha}} } } \right) $$

3.5 Cases of the parameter α

$ \alpha = 0: {\text{ S}}\left( {\text{P}} \right) = { \ln }\left| {\text{P}} \right| $—Hartley Entropy (Hartley 1928) $ \text{lim}\; \alpha \to 1: {\text{ S}}_{ 1} \left( {\text{P}} \right) = - \sum\limits_{{\text{i = 1}}}^{\text{n}} {{\text{p}}_{\text{i}} \times { \ln }\left( {{\text{p}}_{\text{i}} } \right)} $—Shannon Entropy

$$ \alpha = 2: {\text{S}}_{ 2} \left( {\text{P}} \right) = - { \ln }\left( {\sum\limits_{{\text{i = 1}}}^{\text{n}} {\text{p}_{\text{i}}^{2} } } \right) - {\text{Collision or quadratic entropy}} $$

(7)

$$ \alpha \to \infty : \, {\text{S}}_{\infty } \left( {\text{P}} \right) = \mathop {\text{Min}}\limits_{{\text{i = 1}}}^{\text{n}} \left( { - {\text{ln p}}_{\text{i}} } \right) = - \mathop {\text{Max}}\limits_{{\text{i = 1}}}^{\text{n}} \left( {{\text{ln p}}_{\text{i}} } \right) $$

(8)

This last case is the smallest entropy in the Renyi family and so is the strongest way to obtain an information content measure. It is never larger than the Shannon entropy. Thus, the possible ranges of α capture the following:

High α: high probability events
Low α: weight possible events more equally
$ \alpha = 0, 1 $ ≥ Hartley or Shannon, respectively

The Hartley entropy is not of great interest here as for all of our cases here $ \left| {\text{P}} \right| = |\hat{\text{P}}| $, and we have already considered the Shannon entropy. Now we consider the values of the S₂ measure, Eq. 7, for our two characteristic probabilities. For P_cc

$$ {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cc}} } \right) = - { \ln }\left( {{\text{p}}_{ 1} + \sum\limits_{{\text{i = 2}}}^{\text{n}} {{\text{p}}_{\text{i}}^{ 2} } } \right) = - { \ln }\left( { 1+ \sum\limits_{{\text{i = 2}}}^{\text{n}} {0^{2} } } \right) = - {\text{ln 1}} = 0 $$

and for

$$ {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cu}} } \right) = - { \ln }\left( {\sum\limits_{{\text{i = 1}}}^{\text{n}} {\text{p}_{\text{i}}^{\text{2}} } } \right) = - { \ln }\left( {\sum\limits_{{\text{i = 1}}}^{\text{n}} {\frac{1}{{\text{n}^{\text{2}} }}} } \right) = - { \ln }({\text{n}} \times \frac{1}{{\text{n}^{\text{2}} }}) = \, - \left( {{\text{ln 1}} - { \ln }\left( {\text{n}} \right)} \right) = { \ln }\left( {\text{n}} \right) $$

These are the same as the results for the Shannon entropy. To continue we calculate S₂ for our IED example as we have done for the Shannon entropy and Gini index. So we have

$$ {\text{S}}_{ 2} \left( {{\text{P}}_{\text{IED}} } \right) = - { \ln }(0. 3^{ 2} + 0. 2^{ 2} + 0. 4^{ 2} + 0. 1^{ 2)} = - { \ln }\left( {0. 3} \right) = 1. 20 $$

and for the conditioned probability

$$ {\text{S}}_{ 2} (\hat{\text{P}}_{\text{IED}} ) = - { \ln }\left( {0. 3 9^{ 2} + 0. 1 6^{ 2} + 0. 4 2^{ 2} + 0.0 3^{ 2} } \right) = - { \ln }\left( {0. 3 5 6} \right) = 1.0 3 $$

Again as for the other two measures, the resulting value for $ \hat{\text{P}} $ is less than for P.We want to consider only briefly the effect of larger values for the parameter α. For example from Eq. 8 for S_∞ we have

$$ {\text{S}}_{\infty } \left( {{\text{P}}_{\text{IED}} } \right) = 0. 9 2> {\text{S}}_{\infty } (\hat{\text{P}}_{\text{IED}} ) = 0. 8 7 $$

This continues the evaluation of $ \hat{\text{P}}_{\text{IED}} $ being more informative but we note the difference is somewhat smaller.

Next we can utilize the already determined sum of the squared probabilities from the Gini measure to evaluate for the first three possibility cases for S₂.

3.5.1 Case 1

For the probability distribution P_cc we see

$$ {\text{S}}_{ 2} (\hat{\text{P}}) = - { \ln }\left( { 1+ \sum\limits_{{\text{i = 2}}}^{\text{n}} {0^{2} } } \right) = - {\text{ln 1}} = 0 = {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cc}} } \right) $$

but for P_cu

$$ {\text{S}}_{ 2} (\hat{\text{P}}) = 0 < {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cu}} } \right) = { \ln }\left( {\text{n}} \right) $$

3.5.2 Case 2

Since $ \hat{\text{P}} = {\text{P}}, $

$$ {\text{S}}_{ 2} (\hat{\text{P}}) = 0 = {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cc}} } \right){\text{ and S}}_{ 2} (\hat{\text{P}}) = { \ln }\left( {\text{n}} \right) = {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cu}} } \right) $$

3.5.3 Case 3

For the completely certain probability as before,

$$ {\text{S}}_{ 2} (\hat{\text{P}}) = 0 = {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cc}} } \right) $$

and for P_cu

$$ {\text{S}}_{ 2} (\hat{\text{P}}) = - { \ln }\left( {\sum\limits_{{\text{i = 1}}}^{\text{m}} {\hat{\text{p}}_{\text{i}}^{\text{2}} } + \sum\limits_{{\text{i = m + 1}}}^{\text{n}} {\hat{\text{p}}_{\text{i}}^{\text{2}} } } \right) = - { \ln }\left( {\sum\limits_{{\text{i = 1}}}^{\text{m}} {\left( {\frac{1}{\text{m}}} \right)^{2} + \sum\limits_{{\text{i = m + 1}}}^{\text{n}} 0 } } \right){ \ln }\left( {\text{m}} \right) $$

Again since m < n,

$$ {\text{S}}_{ 2} (\hat{\text{P}}) = { \ln }\left( {\text{m}} \right) < { \ln }\left( {\text{n}} \right) = {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cu}} } \right). $$

So we note that values of S₂ in these specific cases are the same as for the Shannon entropy measure; however, the exact numeric value obtained for example in Eq. 3 is not identical so we can conclude there is a close but not exact relationship between them.

3.6 Example: less informative conditioned probability

Next consider the following example for possibility and probability distributions in order to illustrate that not all $ \hat{\text{P}} $'s are more informative than an initial probability P. We shall apply our previous information measures and see that these are consistent in their assessments. So let the possibility and probability distributions be:

$$ \varPi = \{ 0. 1, \, 0. 1,{ 1}.0, \, 0. 1);{\text{ P}} = \{ 0. 8, \, 0. 1, \, 0.0 5, \, 0.0 5) $$

As before we can compute $ \hat{\text{P}} $

$$ {\text{K}} = 0.8 \times 0.1 + 0.1 \times 0.1 + 1.0 \times 0.05 + 0.1 \times 0.05 = 0.08 + 0.01 + 0.05 + 0.005 = 0.145 $$

and

$$ {\hat{\text{p}}}_{1} = 0. 8\times 0. 1/0. 1 4 5= 0. 5 5 2; \, \ldots {\hat{\text{p}}}_{4} = 0. 1\times 0.0 5/0. 1 4 5= 0.0 3 4 $$

$$ \hat{\text{P}} = \{ \, 0. 5 5 2, \, 0.0 6 9, \, 0. 3 4 5, \, 0.0 3 4) $$

We can see intuitively that there is some degree of conflict or lack of consistency between the possibility and probability distributions. For example for the largest probability, p₁ = 0.8, the possibility is quite low, π₁ = 0.1. Then where π₃ = 1.0, we observe the corresponding probability, p₃, is one of two lowest probability values, 0.05.

Now we can assess this situation with the information measures. Starting with Shannon entropy we have the smaller entropy for the initial probability distribution:

$$ {\text{S}}\left( {\text{P}} \right) = 0. 70 8< {\text{S}}(\hat{\text{P}}) = 0. 9 9 5 $$

Likewise we can see that the Gini index measure also yields a similar result for this case involving some degree of conflict indicating that the conditioned probability $ \hat{\text{P}} $is less informative:

$$ {\text{G}}\left( {\text{P}} \right) = 1- 0. 6 5 5= 0. 3 4 5< {\text{G}}(\hat{\text{P}}) = 1- 0. 4 3= 0. 5 7 $$

Finally we obtain similar results for the Renyi entropies, S₂ and S_∞:

$$ {\text{S}}_{2} \left( {\text{P}} \right) = - \ln \left( {0.655} \right) = 0.408 < {\text{S}}_{2} ({\hat{\text{P}}}) = - \ln \left( {0.430} \right) = 0.844 $$

$$ {\text{S}}_{\infty } \left( {\text{P}} \right) = 0. 2 2< {\text{S}}_{\infty } (\hat{\text{P}}) = 0. 5 9 $$

So the information measures are compatible with our intuitive assessment of the conflict between between Π and P. In the next section when we discuss in some detail Zadeh’s consistency measure, we can observe this measure is also indicative of a lower consistency with these distributions.

4 Consistency evaluations of distributions

In this section, we use Zadah’s consistency measure as another approach to assess the integration of uncertainty representations as a supplement to the information measures of the previous section. We shall see that the measure yields evaluations compatible with the information measures of the previous section.

As noted by Sudkamp (1992), a probability–possibility transformation is a “purely mechanical manipulation of the distribution without regard to the underlying problem domain or evidence”. It does not by itself provide guidance of the usefulness of the outcome.

For example, reconsider the result of Theorem 1 with respect to the initial probability distribution. Let p_k be a very low probability; i.e. represent a “rare” event, 0 < p_k <<1; however, as we have seen $ \hat{\text{p}}_{\text{k}} = 1 $, which indicates that although the probability was very small, the corresponding event did actually occur in this particular instance based on the possibility distribution. Furthermore, if the initial probability p_k was actually 0, then K = 0 and the conditional probability is ill defined as we have an indeterminate result: 0/0. Clearly, these results by themselves are unhelpful to decision making and we will see that the consistency measure reflects this.

There have been a number of approaches to consistency measures of probability and possibility distributions that have been proposed (Delgado and Moral 1987; Gupta 1993). As we discussed in the introduction, Zadeh’s approach,

$$ \text{C}_{\text{Z}} (\Pi ,\rm P) =\sum\limits_{{\text{i = 1}}}^{\text{n}} {{\pi}_{\text{i}} \times } \text{p}_{\text{i}} $$

(9)

is identical to the expression for K in the conditioned probability approach. This measure does not represent an inherent relationship but rather represents the intuition that a lowering of an event’s possibility tends to lower its probability, but not the converse.

Another consistency measure that appears in the literature, C_DP (Π, P), is due to Dubois and Prade (1982, 1983). Here for every subset A of the space X,

$$ {\text{C}}_{\text{DP}} \left( {\varPi ,{\text{ P}}} \right) = 1 {\text{ if}}\,\varPi \left( {\text{A}} \right) \ge {\text{P}}\left( {\text{A}} \right) $$

(10)

and is 0 otherwise. This definition is based on the idea that possibility is the weaker representation for a situation than probability.

For our purposes, we focus here on C_Z as it provides a range of values to evaluate the idea of consistency as it relates to the possibilistic conditioning approach. We can note that the maximum value that $ {\text{C}}_{\text{Z}} \left( {\Pi ,{\text{ P}}} \right) = \sum\nolimits_{{\text{i = 1}}}^{\text{n}} {{\pi}_{\text{i}} \times \text{ p}_{\text{i}} } $ can have in general is 1 as $ \sum\nolimits_{{\text{i = 1}}}^{\text{n}} {\text{p}_{\text{i}} \text{ = 1}} $ and π_i is at most 1. Thus, the range of C_Z is the interval [0…1] where 0 can be considered as complete inconsistency and 1 complete consistency. In a more general sense we can relate this to the concept of conflict, of which consistency is only one aspect. Conflict generally is thought of as involving broader semantic issues such source reliability and trustworthiness.

Thus, for the case in Theorem 1 where p_k = 0 when π_k = 1, evaluation of C_Z yields

$$ \begin{aligned} \text{C}_{\text{Z}} \text{(}\Pi \text{,P)} & = {\pi}_{\text{k}} \times \text{p}_{\text{k}} + \sum\limits_{{\text{i} \ne \text{k}}}^{\text{n}} {{\pi}_{\text{i}} \times \text{p}_{\text{i}} } \\ &= 1 \times 0 + \sum\limits_{{\text{i} \ne \text{k}}}^{\text{n}} {\text{0*p}_{\text{i}} } \\ & = 0 \\ \end{aligned} $$

This result implies that these distributions are indeed inconsistent, i.e. a total conflict, and we should not expect a valid conditional probability distribution to be produced for such a situation. Resolution of such a conflict can be managed by considerations of semantic issues such as the reliability of the underlying information sources.

4.1 Zadeh’s consistency measure, C_z, for four possibility cases

We will consider comparative evaluation of the consistency cases in a following section some of which show a conflict. Note that for validity we can check to see that the conditioned distribution does indeed sum to 1 where there is no conflict.

Case 1:

For Π(1,0,…,0);

$$ {\text{C}}_{\text{Z}} \left( {\varPi ,{\text{ P}}_{\text{cc}} } \right) = 1\times 1+ \sum\limits_{{\text{i = 2}}}^{\text{n}} {0 \times 0 = 1;{\text{C}}_{\text{Z}} \left( {\varPi ,{\text{ P}}_{\text{cu}} } \right) = 1\times \frac{1}{\text{n}} + } \sum\limits_{{\text{i = 2}}}^{\text{n}} {0 \times } \frac{1}{\text{n}} = \frac{1}{\text{n}} $$

For P_cc, this result shows complete consistency that since only one probability, p₁, was considered as possible. For P_cu, the measure indicates that there is some inconsistency with P_cu.

Case 2:

For case 2, Π(1, 1,…, 1), which is complete uncertainty, no distinctions are made relative to the probabilities and so both P_cc and P_cu are consistent with the possibility distribution.

$$ {\text{C}}_{\text{Z}} \left( {\varPi ,{\text{ P}}_{\text{cc}} } \right) = 1\times 1+ \sum\limits_{{\text{i = 2}}}^{\text{n}} { 1\times 0 = 1;{\text{ C}}_{\text{Z}} \left( {\varPi ,{\text{ P}}_{\text{cu}} } \right) = } \sum\limits_{{\text{i = 1}}}^{\text{n}} {1 \times } \frac{1}{\text{n}} = {\text{n}} \times \frac{1}{\text{n}} = 1. $$

Case 3, P_cc subcase 1; p_t = 1; t $ \le $ m

With the intermediate possibility case 3 for P_cc where p_t = 1 and t $ \le $ m we have

$$ {\text{C}}_{\text{Z}} \left( {\varPi ,{\text{ P}}_{\text{cc}} } \right) = 1\times 1+ \sum\limits_{{\text{i = 2}}}^{\text{m}} { 1\times 0 + \sum\limits_{{\text{i = m + 1}}}^{\text{n}} {0 \times 0 = 1;} } $$

Case 3, P_cc subcase 2; p_t = 1; t > m

As noted in Sect. 2, there is a problem since π_t = 0, but p_t = 1. As a result, we have zero for the consistency measure,

$$ {\text{C}}_{\text{Z}} \left( {\varPi ,{\text{ P}}} \right) = \sum\limits_{{\text{i = 1}}}^{\text{m}} { 1\times 0 + \left( {{\text{p}}_{\text{t}} = { 1}} \right) \times \left( {\pi_{\text{t}} = 0} \right) + 0 \times 0 = 0.} $$

This result implies that these distributions are completely inconsistent or in conflict. Thus, no valid conditional probability distribution can be produced for such a situation.

Case 3, P_cu Complete uncertainty

$$ {\text{C}}_{\text{Z}} \left( {\varPi ,{\text{ P}}_{\text{cu}} } \right) = \sum\limits_{{\text{i = 1}}}^{\text{m}} { 1\times } \frac{1}{\text{n}} + \frac{\text{n}}{{\text{i = m + 1}}}\,0 \times \frac{1}{\text{n}} = \frac{\text{m}}{\text{n}} < { 1} $$

Similar to subcase 1 for P_cu, n − m of the original probabilities are not compatible with the possibility distribution as reflected in the consistency measure. That is the inconsistency here is due to the contrast in the n-m values of Π and P_cu.

Case 4, P_cc Subcase 1; t = 1, p_t = 1:

Finally for the general possibility case, Case 4, where P = {1, 0,…, 0},

$$ {\text{C}}_{\text{Z}} = 1\times 1+ \sum\limits_{{\text{i = 2}}}^{\text{n}} {{\text{w}}_{\text{i}} \times 0 = 1.} $$

Case 4, P_cc Subcase 2; t > 1, p_t = 1:

Since 0 ≤ w_i < 1, we know not all probabilities are fully represented. Here, all w_i > 0 so we do not have a conflict as in Case 3, Subcase 2 above since C_Z (Π, P) = w_t × p_t = w_t; 0 < w_t < 1

Case 4, P_cu complete uncertainty

$$ {\text{C}}_{\text{Z}} = 1\times \frac{1}{\text{n}} + \sum\limits_{{\text{i = 2}}}^{\text{n}} {{\text{w}}_{\text{i}} \times } \frac{1}{\text{n}} < \frac{1}{\text{n}} + \frac{{\text{n} - \text{1}}}{\text{n}} = 1 $$

4.2 Consistency for example distributions

Next let us consider the consistency for the IED example. For these distributions, if we recall the value of K then we have

$$ {\text{C}}_{\text{Z}} (\varPi_{\text{IED}} ,{\text{ P}}_{\text{IED}} ) = {\text{K}} = 0. 3+ 0. 1 2+ 0. 3 2+ 0.0 2= 0. 7 6. $$

We have seen that the values of each of the three information measures we have evaluated for $ \hat{\text{P}} $ _IED are less than their values for P _IED. At issue is how specific values for consistency are related to the information measure values. Relative to the range of C_Z, 0.76 is reasonably large. We can see next how this consistency value compares to the example of distributions for which the information assessments were shown to be less informative.

Consider again the possibility and probability distributions of Sect. 3.5 above. For these we observed that all the information measures indicated the conditioned probability $ \hat{\text{P}} $was less informative than the initial probability P. For these distributions the consistency measure is

$$ {\text{C}}_{\text{Z}} (\varPi ,{\text{ P}}) = {\text{K}} = 0.0 8+ 0.0 1+ 0.0 5+ 0.00 5= 0. 1 4 5 $$

Clearly this consistency value is quite low compared to C_Z (Π_IED, P _IED). So we can observe that higher consistency values are generally correlated with more informative conditioned probabilities.

Situations like this can occur in many applications. For example with web assistant agents, uncertainty aggregation appears in the integration of information from sources such as user profiles, proximity-based fuzzy clustering and knowledge-based discovery (Loia et al. 2006).

5 Summary

Decision makers are constantly faced with making choices in complex situations for which they have imperfect and often conflicting information. They have difficult decisions in making effective use of this information. Typically such a mix of information has a variety of associated uncertainty, but ultimately the decision maker must come to specific conclusions or actions based on this. Our research here has developed preliminary approaches to assist in this process by providing information theory based quantitative evaluations to guide decisions.

We have developed exact expressions for conditioned probability based on the extreme cases, completely certain and uncertain. For these cases three information measures were applied and yielded compatible results for comparing the informativeness of the original versus the conditioned probability. As well, we carried out the possibilistic conditioning and information evaluations for numeric examples. Additionally we used the Zadeh consistency measure and have seen it correlates well with the evaluation results.

We are currently doing research on the aggregations of both multiple possibility distributions and probability distributions. This will allow us to potentially take advantage of such additional information sources before the computing the conditioned probability. Also we are developing environments to carry out Monte Carlo simulations to test the conditioning approach and the evaluation measures. We are investigating ways to apply such simulations to actual decision-making and assess if more effective outcomes result when the evaluation measures have indicated that the conditioned probability is more informative.

References

Acampora G, Loia V (2008) A proposal of ubiquitous fuzzy computing for Ambient Intelligence. Inf Sci 178:631–646. doi:10.1016/j.ins.2007.08.023
Article Google Scholar
Aristondo O, Garcia-Lparesta J, de la Vega C, Pereira R (2012) The Gini Index, the dual decomposition of aggregation functions and the consistent measurement of inequality. Int J Intell Syst 27:132–152. doi:10.1002/int.21517
Article Google Scholar
Benigni M, Furrer R (2012) Spatio-temporal improvised explosive device monitoring: improving detection to minimise attacks. J Appl Stat 39:2493–2508. doi:10.1080/02664763.2012.719222
Article MathSciNet Google Scholar
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth & Brooks/Cole, Monterey
MATH Google Scholar
Delgado M, Moral S (1987) On the concept of possibility-probability consistency. Fuzzy Sets Syst 21:311–318
Article MATH MathSciNet Google Scholar
Dubois D, Prade H (1982) On several representations of an uncertain body of evidence. In: Gupta M, Sanchez E (eds) Fuzzy information and decision processes. North Holland, Amsterdam, pp 167–182
Google Scholar
Dubois D, Prade H (1983) Unfair coins and necessity measures: towards a possibilistic interpretations of histograms. Fuzzy Sets Syst 10:15–20
Article MATH MathSciNet Google Scholar
Eliazar I, Sokolov I (2010) Maximization of statistical heterogeneity: from Shannon’s Entropy to Gini’s Index. Phys A 389:3023–3038. doi:10.1016/j.physa.2010.03.045
Article MathSciNet Google Scholar
Gini C (1912) Variabilita e mutabilita (Variability and Mutability). Tipografia di Paolo Cuppini, Bologna, Italy
Gupta C (1993) A note on the transformation of possibilistic information into probabilistic information for investment decisions. Fuzzy Sets Syst 56:175–182
Article Google Scholar
Hartley R (1928) Transmission of information. Bell Syst Tech J 7:535–563
Article Google Scholar
Klir G (2006) Uncertainty and information. Wiley, Hoboken
MATH Google Scholar
Loia V, Pedrycz W, Senatore S, Sessa M (2006) Web navigation support by means of cognitive proximity-driven assistant agents. J Am Soc Inf Sci Technol 57:512–527. doi:10.1002/asi.20306
Article Google Scholar
Parsons S (2001) Qualitative methods for reasoning under uncertainty. MIT Press, Cambridge
MATH Google Scholar
Pedrycz W (2010) Human centricity in computing with fuzzy sets. J Ambient Intell Humaniz Comput 1:65–74. doi:10.1007/s12652-009-0008-0
Article Google Scholar
Renyi A (1961) On measures of information and entropy. In: Proceedings of the 4th Berkeley symposium on mathematics, statistics and probability 1960, pp 547–561
Renyi A (1970) Probability theory. North-Holland, Amsterdam
Google Scholar
Reza F (1961) An introduction to information theory. McGraw Hill, New York
Google Scholar
Sent D, van de Gaag L (2007) On the behavior of information measures for test selection. In: Carbonell J, Siebnarm J (eds) Lecture Notes in AI 4594. Springer, Berlin, pp 271–284
Google Scholar
Shannon C (1948) A mathematical theory of communication. Bell Syst Tech J 27(379–423):623–656
Article MathSciNet Google Scholar
Sudkamp T (1992) On probability-possibility transformations. Fuzzy Sets Syst 5:73–81
Article MathSciNet Google Scholar
Xu D, Erdogmuns D (2010) Renyi’s entropy, divergence and their nonparametric estimators. In: Principe J (ed) Information theoretic learning: Renyi’s entropy and kernel perspectives. Springer, Berlin, pp 47–102
Chapter Google Scholar
Yager R (1995) Measures of entropy and fuzziness related to aggregation operators. Inf Sci 82:147–166. SSDI: 0020-0255(94)00030-F
Google Scholar
Yager R (2012) Conditional approach to possibility-probability fusion. IEEE Trans Fuzzy Syst 20:46–56. doi:10.1109/TFUZZ.2011.2165847
Article Google Scholar
Zadeh L (1978) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 1:3–28
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

We would like to thank the Naval Research Laboratory’s Base Program, Program Element No. 0602435 N for sponsoring this research. Ronald Yager has been in part supported by ARO MURI grant Number W911NF-09-1-0392.

Author information

Authors and Affiliations

Naval Research Laboratory, Stennis Space Center, Mississippi, MS, USA
Paul Elmore & Fred Petry
Machine Intelligence Institute, Iona College, New Rochelle, NY, USA
Ronald Yager

Authors

Paul Elmore
View author publications
You can also search for this author in PubMed Google Scholar
Fred Petry
View author publications
You can also search for this author in PubMed Google Scholar
Ronald Yager
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fred Petry.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Elmore, P., Petry, F. & Yager, R. Comparative measures of aggregated uncertainty representations. J Ambient Intell Human Comput 5, 809–819 (2014). https://doi.org/10.1007/s12652-014-0228-9

Download citation

Received: 17 November 2013
Accepted: 20 February 2014
Published: 16 March 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s12652-014-0228-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Comparative measures of aggregated uncertainty representations

Abstract

Similar content being viewed by others

Uncertainty Management: Probability, Possibility, Entropy, and Other Paradigms

Representations of Uncertainty in Artificial Intelligence: Probability and Possibility

Fuzzy entropy functions based on perceived uncertainty

1 Introduction

2 Aggregation of possibility and probability by conditioning

2.1 Case 1: complete certainty

Theorem 1

Proof

2.2 Case 2: complete uncertainty

Theorem 2

Proof

2.3 Case 3: intermediate uncertainty

2.3.1 Pcc, Subcase (1); pt = 1; t ≤ m

2.3.2 Pcc Subcase (2); pt = 1; t > m

2.3.3 Pcu, Complete uncertainty

2.4 Case 4: Generalized possibility distribution

2.4.1 Pcc subcase (1); t = 1, p1 = 1

2.4.2 Pcc subcase (2); t > 1, pt = 1

2.4.3 Pcu complete uncertainty

3 Information measures

3.1 Shannon entropy

3.2 Gini Index

3.3 Application of measures to the four cases

3.3.1 Case 1

3.3.2 Case 2

3.3.3 Case 3

3.3.4 Case 4

3.4 Renyi entropy

3.5 Cases of the parameter α

3.5.1 Case 1

3.5.2 Case 2

3.5.3 Case 3

3.6 Example: less informative conditioned probability

4 Consistency evaluations of distributions

4.1 Zadeh’s consistency measure, Cz, for four possibility cases

4.2 Consistency for example distributions

5 Summary

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

2.3.1 P_cc, Subcase (1); p_t = 1; t ≤ m

2.3.2 P_cc Subcase (2); p_t = 1; t > m

2.3.3 P_cu, Complete uncertainty

2.4.1 P_cc subcase (1); t = 1, p₁ = 1

2.4.2 P_cc subcase (2); t > 1, p_t = 1

2.4.3 P_cu complete uncertainty

4.1 Zadeh’s consistency measure, C_z, for four possibility cases