1 Introduction

Uncertainty is pervasive in the ordinary, everyday activities and decisions of humans. Fuzzy set techniques have been widely recognized for dealing with uncertainty in ambient intelligence (Acampora and Loia 2008) and human-centric systems (Pedrycz 2010). In this paper we are interested in a deeper understanding of such uncertainties and how they can be quantified for human decision makers.

One aspect that must be considered in particular is how to deal with the inherent uncertainty involved when information is aggregated in order to become useful for decision making. Effective decision-making should be able to make use of all the available, relevant information about such aggregated uncertainty. In this paper we investigate quantitative measures that can be used to guide the use of aggregated uncertainty. While there are a number of possible approaches to aggregate the uncertainty information that has been gathered, this paper will examine uncertainty aggregation by the soft computing approach of possibilistic conditioning of probability distribution representations using the approach of Yager (2012). This form of aggregation makes it very amenable to apply the information measures we consider in this paper.

To formalize the problem, let V be a discrete variable taking values in a space X that has both aleatory and epistemic sources of uncertainty (Parsons 2001). Let there be a probability distribution P: X → [0, 1] such that pi ∈ [0, 1], : \( \sum\nolimits_{{{\text{i}} = 1}}^{\text{n}} {{\text{p}}_{\text{i}} } = \, 1 \) that models the aleatory uncertainty. Then the epistemic uncertainty can be modeled by a possibility distribution (Zadeh 1978) such that Π : X → [0, 1], where π(xi) gives the possibility that xi is the value of V, \( {\text{i}} = 1, \, 2, \ldots ,{\text{n}} \). A usual requirement here is the normality condition, \( \mathop {\text{Max}}\nolimits_{\text{x}} \) [π (x)] = 1, that is at least one element in X must be a fully possible. Abbreviating our notation so that pi = p(xi), etc. and πi = π(xi), etc., we have P = {p1, p2,…pn} and Π = {π1, π2,…, πn}.

In possibilistic conditioning, a function f dependent on both P and Π is used to find a new conditioned probability distribution such that

$$ {\text{f }}({\text{P}},\varPi ) \Rightarrow {\text{new}}\;{\hat{\text{P}}} $$

where \( {\hat{\text{P}}} = \left\{ {{\hat{\text{p}}}_{1} ,{\hat{\text{p}}}_{2} , \ldots ,{\hat{\text{p}}}_{\text{n}} } \right\} \) with

$$ {\hat{\text{p}}}_{\text{i}} = {\text{p}}_{\text{i}} \pi_{\text{i}} / {\text{K}};\;{\text{K}} = \sum\limits_{{{\text{i}} = 1}}^{\text{n}} {{\text{p}}_{\text{i}} \pi_{\text{i}} } $$
(1)

A strength of this approach using conditioned probability is that it also captures Zadeh’s concept of consistency between the possibility and the original probability distribution. Consistency provides an intuition of concurrence between the possibility and probability distributions being aggregated. In Eq. (1), K is identical to Zadeh’s possibility-probability consistency measure (Zadeh 1978), CZ (Π, P); i.e. CZ (Π, P) = K.

As an example of a conditioned probability distribution that could be used to provide guidance to a decision maker, consider the following military problem. Over the first decade of the 21st century, a major cause of casualties in both Iraq and Afghanistan combat zones has been from improvised explosive devices (IEDs). Prevention/avoidance of IED attacks is a critical decision and should be based on assessment of the most probable placements of IEDs (Benigni and Furrer 2012). One approach is to consider historical probability distributions characterizing typical placements sites. Let the placement sites considered be X1, X2, X3, and X4. The variable VIED takes values from the space X = {X1, X2, X3, X4}. For this example, let the probability distribution for past IED placements be denoted as p(VIEDhistoric) ≡ PIED. So we have the distribution

$$ {\text{P}}_{\text{IED}} = \left\{ {\frac{{{\text{X}}1}}{0.3},\frac{{{\text{X}}2}}{0.2},\frac{{{\text{X}}3}}{0.4},\frac{{{\text{X}}4}}{0.1}} \right\}, $$

where the upper halves indicate locations and the lower the corresponding probabilities.

Typically there may be additional or more current information based on intelligence reports that are subjective in nature. A possibility distribution could be used to represent such subjective information. If the intelligence officials provide their assessment the possibility distribution for this will be denoted as Π (VIEDintelligence) ≡ ΠIED and we have

$$ \varPi_{\text{IED}} = \left\{ {\frac{{{\text{X}}1}}{1},\frac{{{\text{X}}2}}{0.6},\frac{{{\text{X}}3}}{0.8},\frac{{{\text{X}}4}}{0.2},} \right\} $$

We can now combine these by the possibilistic conditioning approach. Using Eq. (1) we have first

$$ {\text{K}} = \, 0.3 \times 1 \, + \, 0.2 \times 0.6 \, + \, 0.4 \times 0.8 \, + \, 0.1 \times 0.2 \, = \, 0.76, $$

Then,

$$ {\hat{\text{p}}}_{1} = 0.3 \times 1/0.76 = 0.39; \ldots ;{\hat{\text{p}}}_{4} = 0.1 \times 0.2/0.76 = 0.03 $$

The conditioned probability distribution for IED locations is then

$$ {\hat{\text{P}}}_{\text{IED}} = \left\{ {\frac{{{\text{X}}1}}{0.39},\frac{{{\text{X}}2}}{0.16},\frac{{{\text{X}}3}}{0.42},\frac{{{\text{X}}4}}{0.03}} \right\}. $$

The issue at hand is does \( {\hat{\text{P}}}_{\text{IED}} \) represent an improved estimate of the IED locations? In order to provide intuition and tools to assess this question, this paper will provide the following discussions, reusing the IED distributions above as an ongoing numerical example. Section 2 begins by providing theorems for extreme case of Π, one of absolute certainty and the other of complete uncertainty. These theorems provide simplifications, check results and characterize the approach. The section then continues with combination of two more general Π distributions with for four different classes of Π distributions. In Sect. 3, we assess the utility of an aggregated uncertainty and to decide if this aggregation provides more effective information through consideration of information measures; including Shannon entropy, Gini index, and Renyi entropy; to gauge the aggregated uncertainty. For our on going IED numerical example, the Shannon entropy (Reza 1961),

$$ {\text{S(P)}} = - \sum\limits_{{{\text{i}} = 1}}^{\text{n}} {{\text{p}}_{\text{i}} 1 {\text{n}}\left( {{\text{p}}_{\text{i}} } \right),} $$
(2)

yields for PIED and \( {\hat{\text{P}}}_{\text{IED}} \)

$$ {\text{S}}\left( {{\hat{\text{P}}}_{\text{IED}} } \right) = 1.13 < {\text{S}}\left( {{\text{P}}_{\text{IED}} } \right) = 1.28 $$
(3)

These measures for the more generalized analytic cases are presented here. Section 4 then discusses consistency and shows that it provides an additional measure that is compatible with the information measures from the previous section. The paper then provides a summary and discussion of future research in Sect. 5.

2 Aggregation of possibility and probability by conditioning

To examine the conditioning approach further we formulate four distinct cases for the possibility distributions. The first two, complete certainty, complete uncertainty, represent the extreme cases of possibility distributions. Then two intermediate cases, partial certainty and a generalized certainty case will be discussed. For each case we provide instantiations of the cases based on the two extreme probability distributions, completely certain, Pcc, and completely uncertain, Pcu. These cases are shown in Table 1. Additional measures discussed in Sects. 3 and 4 provide guidance for use of the result.

Table 1 Possibility and probability distribution cases

2.1 Case 1: complete certainty

A possibility distribution with exactly one possibility value equal to 1 and all other values equal 0 represents a completely certain distribution. Now we will prove the relationship between such a distribution and the conditioned probability.

Theorem 1

If a possibility distribution \( \varPi \) is completely certain, then its conditioned probability \( {\hat{\text{P}}} \) is completely certain.

Proof

\( \varPi \) is completely certain if \( \exists \) k such that \( {\rm{\pi}_{k}} = 1,{\rm and}\;{\rm{\pi}_{i}} = 0,\forall \;{\rm i \ne {k}}. \) To obtain the conditioned probability we first calculate K using Eq. (1):

$$ {\text{K}} = {\text{p}}_{\text{k}} \pi_{\text{k}} + \sum\limits_{{{\text{i}} \ne {\text{k}}}}^{\text{n}} {{\text{p}}_{\text{i}} \pi_{\text{i}} = {\text{p}}_{\text{k}} \times 1} + \sum\limits_{{{\text{i}} \ne {\text{k}}}}^{\text{n}} {{\text{p}}_{\text{i}} \times 0 = {\text{p}}_{\text{k}} } $$

So now we find the conditioned probabilities

$$ \begin{aligned} {\hat{\text{p}}}_{\text{k}} =& {\text{p}}_{\text{k}} \pi_{\text{k}} /{\text{K}} = {\text{p}}_{\text{k}} \times 1/{\text{p}}_{\text{k}} = 1 \hfill \\ {\hat{\text{p}}}_{\text{i}} =& {\text{p}}_{\text{i}} \pi_{\text{i}} /{\text{K}} = {\text{p}}_{\text{i}} \times 0/{\text{p}}_{\text{k}} = 0,\;{\text{i}} \ne {\text{k}} \hfill \\ \end{aligned} $$

Thus the conditioned probability distribution \( {\hat{\text{P}}} \)is

$$ {\hat{\text{P}}} = \left\{ {0, \ldots ,{\hat{\text{p}}}_{\text{k}} = 1, \ldots 0} \right\} $$

which is a completely certain probability distribution. □

Some of the issues relative to the interpretation of this result with respect to consistency and conflict will be discussed in Sect. 4.

2.2 Case 2: complete uncertainty

If there is no distinction that is made on the values of the variable V by the possibility distribution, we say this implies complete uncertainty. This is then represented in the distribution by all values equaling 1 as shown in Table 1.

Theorem 2

If a possibility distribution \( \varPi \) is completely uncertain, then its conditioned probability \( {\hat{\text{P}}} \) is identical to the original probability P.

Proof

\( \varPi \) is completely uncertain if \( \forall {\text{i}},\pi_{\text{i}} = 1. \) To obtain the conditioned probability we first calculate K using Eq. (1)

$$ {\text{K}} = \sum\limits_{{{\text{i}} = 1}}^{\text{n}} {{\text{p}}_{\text{i}} \pi_{\text{i}} } = + \sum\limits_{{{\text{i}} \ne {\text{k}}}}^{\text{n}} {{\text{p}}_{\text{i}} \times 1 = 1} $$

since \( \sum\nolimits_{{{\text{i}} = 1}}^{\text{n}} {{\text{p}}_{\text{i}} } = 1 \) for any probability distribution.So now we find the conditioned probabilities

$$ {\hat{\text{P}}}_{\text{i}} = {\text{p}}_{\text{i}} *\pi_{\text{i}} /{\text{K}} = {\text{p}}_{\text{i}} *1/1 = {\text{p}}_{\text{i}} $$

Thus the conditioned probability distribution \( {\hat{\text{P}}} \) is

$$ {\hat{\text{P}}} = \left\{ {{\text{p}}_{1} ,{\text{p}}_{2} , \ldots {\text{p}}_{\text{n}} } \right\} = {\text{P}}, $$

which is the original probability distribution. □

The interpretation of this result is that the possibility distribution shows no preference for any specific value and so the default is that the information to be used in a decision should be that represented by the original probability distribution. So for the two extreme probability cases (Table 1) we have respectively:

  1. (a)
    $$ {\text{P}}_{\text{cc}} : = {\hat{\text{P}}}\left\{ { \, 0, \, 0, \ldots ,0,{\text{ p}}_{\text{t}} = 1, \ldots 0} \right\} $$
  2. (b)
    $$ {\text{P}}_{\text{cu}} :{\hat{\text{P}}} = \left\{ {\frac{1}{\text{n}},\frac{1}{\text{n}}, \ldots \frac{1}{\text{n}}} \right\} $$

Clearly these are valid as for both distributions: \( \sum\nolimits_{{\text{i} = 1}}^{\text{n} } {\hat{\text{P}}}_{\text{i} } = 1. \)

2.3 Case 3: intermediate uncertainty

Here we examine the case that falls between complete certainty and complete uncertainty for a possibility distribution. To represent this we allow m values of Π = 1, such that 1 < m < n. For convenience we index these values from i = 1, so we have for the distribution:

$$ \varPi = \left\{ {1, \, 1, \ldots 1, \, 0, \, 0 \ldots 0} \right\}:\pi_{\text{i}} = 1;{\text{ i}} = 1 \ldots {\text{m}},\pi_{\text{j}} = 0;{\text{ j}} = {\text{m}} + 1 \ldots {\text{n}} $$

Then clearly K = p1 + p2 + ··· + pm and

$$ {\hat{\text{p}}}_{\text{i}} = {\text{p}}_{\text{i}} \times 1/ \left( {{\text{p}}_{ 1} + {\text{p}}_{ 2} + \cdots + {\text{ p}}_{\text{m}} } \right);{\text{i }} = 1\ldots {\text{m}}; \,{\hat{\text{p}}}_{\text{m + 1}} = \ldots {\hat{\text{p}}}_{\text{n}} = \, 0 $$

In order to understand what happens in this intermediate uncertainty situation, we will examine the two extreme probability distributions being conditioned by this possibility. First for Pcc we have to consider two subcases.

2.3.1 Pcc, Subcase (1); pt = 1; t ≤ m

$$ {\text{K}} = 0 \times 1 + 0 \times 1 + \cdots + \left( {{\text{p}}_{\text{t}} = 1} \right) \times (\pi_{\text{t}} = 1) + \cdots 0 \times 0 = 1 $$
$$ {\hat{\text{p}}}_{\text{t} } = \frac{1 \times 1}{\text{K} } = \frac{1}{1} = 1;\,{\hat{\text{p}}}_{{\text{j} \ne \text{t} }} = 0 $$
$$ {\text{So}} \,\,\,\,{\hat{\text{P}}} = \{ 0, \, 0, \ldots ,0,\,{\hat{\text{p}}}_{\text{t} } = 1, \ldots 0\} = {\text{P}}_{\text{cc}} $$

2.3.2 Pcc Subcase (2); pt = 1; t > m

For this subcase, however, there is a problem since πt = 0, but pt = 1. This case will be discussed further in Sect. 4.

2.3.3 Pcu, Complete uncertainty

Next for the completely uncertain probability distribution Pcu we find

$$ {\text{K}} = \sum\limits_{{{\text{i}} = 1}}^{\text{m}} {{\text{p}}_{\text{i}} \pi_{\text{i}} } + \sum\limits_{{{\text{i}} = {\text{m}} + 1}}^{\text{n}} {{\text{p}}_{\text{i}} \pi_{\text{i}} } = {\text{ m}} \times \left( {\frac{1}{\text{n}}} \right) \times 1 + \left( {{\text{n}} - \left( {{\text{m}} + 1} \right)} \right) \times \left( {\frac{1}{\text{n}}} \right) \times 0 = \frac{\text{m}}{\text{n}} $$

Then the conditioned probability values are

$$ {\hat{\text{p}}}_{\text{i}} = \frac{1}{\text{n}} \times 1/\left( {\frac{\text{m}}{\text{n}}} \right) = \frac{1}{\text{m}};\;{\text{i}} = 1 \ldots {\text{m}} $$
$$ {\hat{\text{p}}}_{\text{m + 1}} = \ldots {\hat{\text{p}}}_{\text{n}} = \frac{1}{\text{n}} \times 0/\left( {\frac{\text{m}}{\text{n}}} \right) = 0 $$
$$ {\hat{\text{p}}} = \left\{ {\frac{ 1}{\text{m}},\frac{ 1}{\text{m}}, \ldots ,\frac{ 1}{\text{m}},{\hat{\text{p}}}_{\text{m + 1}} = 0,0, \ldots 0} \right\} $$

Therefore, we have obtained a subset of equally distributed conditioned probabilities corresponding to the possibilities that are 1. Note that these equally distributed probabilities are greater than the \( \frac{1}{\text{n}} \) values for the initial Pcu. Again, this is clearly a valid distribution as \( \sum\nolimits_{{{\text{i}} = 1}}^{\text{n}} {{\hat{\text{p}}}_{\text{i}} } = {\text{m}} \times \left( {\frac{1}{\text{m}}} \right) = \left( {{\text{n}} - \left( {{\text{m}} + 1} \right)} \right) \times 0 = 1 \)

2.4 Case 4: Generalized possibility distribution

This is a general case for which we index π1 = 1 and to capture the situation between complete certainty and uncertainty we use the weights, 0 < wi < 1, for the n − 1 arbitrary possibility values. So from Table 1 this possibility distribution is:

$$ \varPi = \{ 1,{\text{w}}_{ 2} ,{\text{w}}_{ 3} , \ldots ,{\text{w}}_{\text{n}} \} $$

and for the conditioned probabilities we obtain

$$ {\text{K}} = {\text{p}}_{1} \times 1 + \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{p}}_{\text{i}} {\text{w}}_{\text{i}} } = {\text{p}}_{1} + {\text{K}}^{\prime },\;{\text{where}}\;{\text{K}}^{\prime } = \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{p}}_{\text{i}} {\text{w}}_{\text{i}} } $$
$$ {\hat{\text{p}}}_{1} = {\text{ p}}_{1} \times 1 + (p_{1} {\text{ + K}}^{\prime });\quad {\hat{\text{p}}}_{\text{i}} = {\text{ p}}_{\text{i}} \times {\text{w}}_{\text{i}} + (p_{1} {\text{ + K}}^{\prime });\;{\text{i}} = 2 \ldots {\text{n}} $$

Again we will examine the conditioning of the extreme probabilities and once more have to consider the subcases of Pcc

2.4.1 Pcc subcase (1); t = 1, p1 = 1

$$ {\text{K}}^{\prime } = \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {0 \times {\text{w}}_{\text{i}} = 0;} $$
$$ {\hat{\text{p}}}_{ 1} = 1 \times 1/(1 + 0) = 1;\;{\hat{\text{p}}}_{\text{i}} = 0 \times {\text{w}}_{\text{i}} /(1 + 0) = 0;\quad {\text{i}} = 2 \ldots {\text{n}} $$
$$ {\hat{\text{P}}} = \left\{ { 1, \, 0, \ldots 0} \right\} = {\text{P}}_{\text{cc}} $$

2.4.2 Pcc subcase (2); t > 1, pt = 1

We find the conditioned probability here as:

$$ {\text{K}} = 0 \times 1 + {\text{p}}_{\text{t}} \times {\text{w}}_{\text{t}} + \sum\limits_{{{\text{i}} = 2,{\text{i}} \ne {\text{t}}}}^{\text{n}} {0 \times {\text{w}}_{\text{i}} = {\text{w}}_{\text{t}} } $$
$$ {\hat{\text{p}}}_{1} = 0 \times 1/ {\text{w}}_{\text{t}} = 0; = {\hat{\text{p}}}_{\text{t}} = {\text{p}}_{\text{t}} \times {\text{w}}_{\text{t}} / {\text{w}}_{\text{t}} = 1\times {\text{w}}_{\text{t}} / {\text{w}}_{\text{t}} = 1 $$
$$ {\hat{\text{p}}}_{\text{i}} = 0 \times {\text{w}}_{\text{i}} /{\text{w}}_{\text{t}} = 0;{\text{ i}} = 2\ldots {\text{n}},{\text{ i}} \ne {\text{t}} $$
$$ {\hat{\text{P}}} = \{ 0, \, 0, \ldots ,{\hat{\text{p}}}_{\text{t}} = 1, \ldots 0\} = {\text{P}}_{\text{cc}} $$

2.4.3 Pcu complete uncertainty

Finally for the completely uncertain probability Pcu

$$ {\text{K}} = \frac{1}{\text{n}} + \sum\limits_{{{\text{i}} = 2}}^{\text{n}} { 1/{\text{n}} \times {\text{w}}_{\text{i}} } = \left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} { {\text{w}}_{\text{i}} } } \right)/{\text{n}} $$
$$ {\hat{\text{p}}}_{1} = \frac{1}{\text{n}} \times 1/\left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right)/{\text{n}} = 1/\left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right) $$
$$ {\hat{\text{p}}}_{\text{i}} = \frac{1}{\text{n}} \times {\text{w}}_{\text{i}} /\left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right)/{\text{n}} = {\text{w}}_{\text{i}} /\left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right)\;\;{\text{i }} = { 2} \ldots {\text{n}} $$

Here we can see since 0 < wi < 1, then \( {\hat{\text{p}}}_{ 1} \) < 1 and \( {\hat{\text{p}}}_{\text{i}} \) < \( {\hat{\text{p}}}_{ 1}. \) Also these conditioned probabilities still sum to 1.

$$ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\hat{\text{p}}}_{\text{i}} } = 1/\left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right) + \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } /\left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right) = \left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right)/\left( { 1+ \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right) = 1 $$

To consider specific cases for the weights here we look at an equal distribution of the weights values. In a sense this is a default choice. After deciding on which possibility value to choose as 1, then if there is no preference for the other values, a default of equal values is reasonable. We first consider that since there are n-1 weights to be assigned we use the weight values as

$$ {\text{w}}_{1} = 1{\text{ and w}}_{\text{i}} = \frac{1}{{{\text{n}} - 1}},\;{\text{i}} = 2, \ldots ,{\text{n}}. $$

First

$$ \left( {1 + \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right) = \left( {1 \, + \, \left( {{\text{n}} - 1} \right) \times \frac{1}{{{\text{n}} - 1}} \, } \right) = 2 $$

Then

$$ {\hat{\text{p}}}_{1} = 1/\left( {1 + \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right) = \frac{ 1}{2};\;{\text{and}}\;{\hat{\text{p}}}_{\text{i}} = {\text{w}}_{\text{i}} /\left( {1 + \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {{\text{w}}_{\text{i}} } } \right) = \left( {\frac{1}{{{\text{n}} - 1}}} \right)/2 = \frac{ 1}{{2({\text{n}} - 1)}} $$

So \({\hat{\text{P}}} = \left\{ {\frac{1}{2},\frac{1}{{2({\text{n}} - 1)}},\frac{1}{{2({\text{n}} - 1)}}, \cdots \frac{1}{{2({\text{n}} - 1)}}} \right\} \) and clearly this is a valid distribution as

$$ \sum\limits_{{{\text{i}} = 1}}^{\text{n}} {{\hat{\text{p}}}_{\text{i}} = \frac{1}{2}} \sum\limits_{{{\text{i}} = 2}}^{\text{n}} {\frac{1}{{2({\text{n}} - 1)}} = \frac{1}{2} + ({\text{n}} - 1)} \frac{1}{{2({\text{n}} - 1)}} = \frac{1}{2} + \frac{1}{2} = 1 $$

3 Information measures

In this section we will consider measures that can be used to evaluate a conditioned probability distribution relative to the original probability. Shannon’s entropy has been a commonly accepted standard for information metrics; however, the concept of information is so rich and broad that multiple approaches to the quantification of information are desirable (Klir 2006; Xu and Erdogmuns 2010). Thus, we will also examine other measures, such as the Gini index and Renyi entropy, in this section.

3.1 Shannon entropy

Shannon entropy has been the most broadly applied measure of randomness or information content (Shannon 1948). For a probability distribution P = {p1, p2,…pn} as was discussed previously in Eq. (2), \( {\text{S}}({\text{P}}) = - \sum\nolimits_{{{\text{i}} = 1}}^{\text{n}} {{\text{p}}_{\text{i}} } 1 {\text{n}}({\text{p}}_{\text{i}} ) \). The well-known minimum and maximum values for the Shannon entropy are presented in the context of our two extreme probability cases.

First for complete certainty, Pcc, we recall that here, for some t, pt = 1, and so

$$ {\text{S}}\left( {{\text{P}}_{\text{cc}} } \right) = - \left( { 1 {\text{ ln (1)}} + \sum\limits_{{{\text{i}} = 1, \ne {\text{t}}}}^{n} {0{\text{ ln (}}0 )} } \right) = 1\times 0 + \sum\limits_{{{\text{i}} = 1, \ne {\text{t}}}}^{\text{n}} 0 = 0 $$

Note this follows as \( \mathop {\lim }\nolimits_{{{\text{p}} \to 0 +, }} \)p ln p = 0. That is when a probability distribution represents complete certainty, then we have no uncertainty, i.e. maximum information.

Then for the case of complete uncertainty represented by the equi-probable distribution, Pcu, where ∀ i pi = 1/n

$$ \text{S}\left( {\text{P}_{{\text{cu}}} } \right) = -\sum\limits_{{\text{i = 1}}}^{\text{n}} {\frac{1}{\text{n}}} \times \text{ln }\left( {\frac{1}{\text{n}}} \right) = -\frac{1}{\text{n}}\sum\limits_{{\text{i = 1}}}^{\text{n}} {\left( {\text{ln (1)}-\text{ln (n)}} \right) = -\text{n} \times \frac{1}{\text{n}}\left( {0 - \text{ln (n)}} \right)\text{ = ln (n)}.} $$

That is, when all probabilities are equi-probable, this is the most unpredictable, uncertain situation and so represents the minimum information. In summary the range of Shannon’s entropy for a given probability distribution is:

$$ 0 \le {\text{S(P)}} \le {\text{ln (n)}} $$
(4)

3.2 Gini Index

The Gini index, G(P), also known as the Gini coefficient, is a measure of statistical dispersion developed by Gini (1912), and is defined as

$$ {\text{G(P)}} \equiv 1- \sum\limits_{{\text{i = 1}}}^{\text{n}} {{\text{p}}_{\text{i}}^{2} } $$
(5)

Some practitioners use G(P) versus S(P) since it does not involve a logarithm, making analytic solutions simpler. Gini index is used in consideration of inequalities in various areas such as economics, ecology and engineering (Aristondo et al. 2012). A very important application of the Gini index is as a splitting criterion for decision tree induction in machine learning and data mining (Breiman et al. 1984).

It is accepted in practice for diagnostic test selection that the Shannon and Gini measures are interchangeable (Sent and van de Gaag 2007). The specific relationship of Shannon entropy and the Gini index has been discussed in the literature (Eliazar and Sokolov 2010). Theoretical support for this practice is provided in Yager’s independent consideration of alternative measures of entropy (Yager 1995), where he derives the same form for an entropy measure as the Gini measure.

Now as done for Shannon entropy, we consider the maximum and minimum values for G(P). Letting \( \text{R = }\sum\nolimits_{{\text{i = 1}}}^{\text{n}} {{\text{p}}_{\text{i}}^{2} } \), then since 0 ≤ pi ≤ 1 (0 ≤ p 2i  ≤ 1) and at least one pi > 0, 0 < R ≤ 1. R = 1 only if for some t, pt = 1. Thus, G(P) > 0 unless pt = 1 where G(P) = 0. This is the case for the distribution Pcc, since pt = 1, pi = 0, i ≠ t. Specifically

$$ {\text{G}}\left( {{\text{P}}_{\text{cc}} } \right) = 1- ({\text{p}}_{\text{t}}^{2} + \sum\limits_{{\text{i} \ne \text{t}}}^{\text{n}} {{\text{p}}_{\text{i}}^{2} ) = 1- \left( { 1^{ 2} + 0} \right) = 0} $$

As for the Shannon entropy this corresponds to no uncertainty and has the same value of 0.

Next we examine the index for the equi-probable distribution, Pcu, where pi = 1/n for all i.

$$ {\text{G}}\left( {{\text{P}}_{\text{cu}} } \right) = 1- \sum\limits_{{\text{i = 1}}}^{\text{n}} {\frac{1}{{\text{(n)}^{ 2} }} = 1- {\text{n}}\frac{1}{{\text{(n}^{\text{2}} )}} = 1- \frac{1}{\text{n}} = \frac{{\text{n}- 1}}{\text{n}}} $$

Consider the behavior of G(Pcu) as n increases. For \( {\text{n}} = 2\left( {{\text{p}}_{\text{i}} = \frac{1}{2};\,{\text{p}}_{\text{j}} = \frac{1}{2}} \right) \)

$$ {\text{G}}\left( {{\text{P}}_{\text{cu}} } \right) = \frac{2 - 1}{2} = 1/ 2 $$

Then for n = 10 (pi = 0.1,…, pj = 0.1) we have

$$ {\text{G}}\left( {{\text{P}}_{\text{cu}} } \right) = \frac{10 - 1}{10} = 9/ 10 $$

Clearly then for n → ∞, G(P) → 1. Thus, in the case of an equiprobable distribution, we have increasing values for G(Pcu) with n, and in general the range for G(P) is

$$ 0 \le {\text{G}}\left( {\text{P}} \right) \le \frac{{\text{n} - 1}}{\text{n}} < 1 $$
(6)

Now we can use this measure for evaluating our IEDs’ example and compare G(P) for the original and the conditioned probability distributions. First

$$ {\text{G}}\left( {{\text{P}}_{\text{IED}} } \right) = 1- \left( {0. 3^{ 2} + 0. 2^{ 2} + 0. 4^{ 2} + 0. 1^{ 2} } \right) = 1- 0. 3= 0. 7 $$

So for \(\hat{\rm{P}} \)

$$ {\text{G}}(\hat{\rm{P}}_{\text{IED}} ) = 1- \left( {0. 3 9^{ 2} + 0. 1 6^{ 2} + 0. 4 2^{ 2} + 0.0 3^{ 2} } \right) = 1- 0. 3 5 4= 0. 6 4 6 $$

Thus we see that as with the Shannon measure result (3), based on the Gini index \( \hat{\rm{P}}_{{_{\text{IED}} }} \) appears again to be more informative than PIED.

3.3 Application of measures to the four cases

In this section we apply the Shannon and Gini measures to the original and conditioned probability distributions for the four possibility distribution cases of Sect. 2.2 and compare the measures’ values. As both measures have increasing values with increasing uncertainty, the conditioned probability will be more informative for decision-making if it’s measure value is less than for the original probability. We shall see that both measures basically agree for the cases considered although their specific values are in different ranges.

3.3.1 Case 1

For the completely certain possibility, we consider only where there is no conflict and the conditioned probability is

$$ \hat{\text{P}} = \left\{ { 1, \, 0, \ldots 0} \right\} $$

Then we have first for both measures with the distribution Pcc

$$ {\text{S}}(\hat{\text{P}}) = {\text{G}}(\hat{\text{P}}) = 0 = {\text{S}}\left( {{\text{P}}_{\text{cc}} } \right) = {\text{G}}\left( {{\text{P}}_{\text{cc}} } \right) $$

But for the equi-probable initial distribution Pcu

$$ {\text{S}}\left( {{\text{P}}_{\text{cu}} } \right) = {\text{ln(n)}} > {\text{S}}(\hat{\text{P}}) = 0 $$
$$ {\text{G}}\left( {{\text{P}}_{\text{cu}} } \right) = \frac{{\text{n} - 1}}{\text{n}} > {\text{G}}(\hat{\text{P}}) = 0 $$

So the conditioned probability distribution is more informative in the second case for the probability Pcu.

3.3.2 Case 2

Next for the case of complete possibilistic uncertainty, we had \( \hat{\text{P}} \) = P for all the probability distributions and so we have

$$ {\text{S}}(\hat{\rm{P}}) = {\text{S}}\left( {\text{P}} \right){\text{ and G}}(\hat{\rm{P}}) = {\text{G}}\left( {\text{P}} \right) $$

We can conclude that the conditioned probability distribution \( \hat{\rm{P}} \) is no more informative than the original probability P since the possibility distribution Π does not contribute any information as it represents complete uncertainty.

3.3.3 Case 3

Recall this is the intermediate possibility case and here we consider the probability, Pcc, first for the Shannon measure and then the Gini index. Since for no conflict \( \hat{\rm{P}} = \left\{ {0, \, 0, \ldots ,0,{\text{ p}}_{\text{t}} = 1, \ldots 0} \right\} \) then as before for this distribution

$$ {\text{S}}(\hat{\text{P}}) = {\text{S}}\left( {{\text{P}}_{\text{cc}} } \right) = 0{\text{ and G}}(\hat{\text{P}}) = {\text{G}}\left( {{\text{P}}_{\text{cc}} } \right) = 0 $$

Next for the equi-probable distribution Pcu, the Shannon measure is

$$ \begin{aligned} {\text{S}}(\hat{\rm{P}}) & = - \left( {\sum\limits_{{\text{i = 1}}}^{\text{m}} {\left( {\frac{1}{\text{m}}} \right)} { \ln }\left( {\frac{1}{\text{m}}} \right) + \sum\limits_{{\text{i = m + 1}}}^{\text{n}} {0{ \ln }\left( 0 \right)} } \right) = - \frac{1}{\text{m}}\sum\limits_{{\text{i = 1}}}^{\text{m}}(\text{1n(1)} - \text{1n(m)}) \\ & = - \frac{1}{\text{m}} \times \left( { - {\text{ m ln}}\left( {\text{m}} \right)} \right) = { \ln }\left( {\text{m}} \right) \\ \end{aligned} $$

Now since Pcu is an equi-probable distribution and n > m

$$ {\text{S}}\left( {{\text{P}}_{\text{cu}} } \right) = { \ln }\left( {\text{n}} \right) > { \ln }\left( {\text{m}} \right) = {\text{S}}(\hat{\rm{P}}) $$

Next for the Gini measure

$$ {\text{G}}(\hat{\rm{P}}) = 1- \left( {\sum\limits_{{\rm{i = 1}}}^{\rm{m}} {\hat{\rm{p}}_{\rm{i}}^{2} } + \sum\limits_{{\rm{i = m + 1}}}^{\rm{m}} {\hat{\rm{p}}_{\rm{i}}^{2} } } \right) = 1- \left( {\sum\limits_{{\text{i = 1}}}^{\rm{m}} {\left( {\frac{1}{{\rm{m}^{\rm{2}} }}} \right) + } \sum\limits_{{\rm{i = m + 1}}}^{\rm{n}} 0 } \right) = 1- {\text{m}} \times \frac{1}{{\rm{m}^{\rm{2}} }} $$

Recall G(Pcu) = = 1 − 1/n and since 1 < m < n, 1/n < 1/m

$$ {\text{G}}(\hat{\rm{P}}) = 1- \frac{1}{\text{m}} < 1- \frac{1}{\text{n}} = {\text{G}}\left( {{\text{P}}_{\text{cu}} } \right) $$

Thus we see that by both measures the conditioned probability is more informative in this case.

3.3.4 Case 4

This is the case of the generalized possibility distribution in which for Pcc we saw that \( \hat{\text{P}} \) \( \hat{\text{P}} \) = Pcc. So again we have

$$ {\text{S}}(\hat{\text{P}}) = {\text{G}}(\hat{\text{P}}) = 0 = {\text{S}}\left( {{\text{P}}_{\text{cc}} } \right) = {\text{G}}\left( {{\text{P}}_{\text{cc}} } \right) $$

Next for the other probability distribution, Pcu, we had obtained for \( \hat{\text{P}} \) a general expression in terms of the weights wi. Here we will consider the special case we examined for the equal distribution of the weights where we had for \( \hat{\text{P}} \)

$$ \hat{\text{P}} = \left\{ {\frac{1}{2},\,\frac{1}{{2(\text{n} - 1)}},\,\frac{1}{{2(\text{n} - 1)}}, \ldots \frac{1}{{2(\text{n} - 1)}}} \right\} $$

Now we can apply our measures to this conditioned distribution. First for S(\( \hat{\text{P}} \)):

$$ \begin{aligned} {\text{S}}(\hat{\text{P}}) & = - \, \left[ {\frac{1}{2}{ \ln }\left( \frac{1}{2} \right) + \sum\limits_{{\text{i = 2}}}^{\text{n}} {\frac{1}{{2\text{(n} - 1)}}\text{In}\left( {\frac{1}{{2\text{(n} - 1)}}} \right)} } \right] \\ & = \left[ {\frac{1}{2}\left( {{\text{ln1}} - {\text{ln2}}} \right) + \left( {{\text{n}} - 1} \right)\frac{1}{{2\text{(n} - 1)}}\left( {{\text{ln1 }} - { \ln }\left( { 2\left( {{\text{n}} - 1} \right)} \right)} \right)} \right] = - \left[ { - \frac{{\text{1n2}}}{2} - \frac{1}{2}{ \ln }\left( { 2\left( {{\text{n}} - 1} \right)} \right)} \right] \\ & = \frac{1}{2} \left[ {{\text{ ln 2 }} + { \ln }\left( { 2\left( {{\text{n}} - 1} \right)} \right)} \right] \\ \end{aligned} $$
$$ {\text{For n}} = 2,{\text{ S}}(\hat{\text{P}}) = \frac{1}{2}\left[ {{\text{ ln 2 }} + { \ln }\left( { 2\left( 1\right)} \right)} \right] = {\text{ln 2}} = {\text{S}}\left( {\text{P}} \right),{\text{ but for n}} > 2 $$
$$ {\text{S}}(\hat{\text{P}}) < {\text{ln n}} = {\text{S}}\left( {\text{P}} \right) $$

Next for the Gini index:

$$ \begin{aligned} {\text{G}}(\hat{\text{P}}) & = 1- \left( {\left( \frac{1}{2} \right)^{2} + \sum\limits_{{\text{i = 1}}}^{\text{n}} {\left( {\frac{1}{{2(\text{n} - \text{1})}}} \right)^{2} } } \right) = { 1} - \left( {\frac{1}{4} + \, \left( {{\text{n}} - 1} \right) \times \left( {\frac{1}{{2(\text{n} - \text{1})}}} \right)^{2} } \right) = 1- \left( {\frac{1}{4} + \frac{1}{{4(\text{n} - \text{1})}}} \right) \\ & = \frac{{\text{3n} - \text{4}}}{{\text{3n} - \text{4}}} \\ \end{aligned} $$

Similar to the Shannon measure for n = 2, the Gini measure is the same for P and \( \hat{\text{P}} \)

$$ {\text{G}}(\hat{\text{P}}) = 1- \frac{2}{4} = \frac{1}{2}\quad{\text{and G}}\left( {\text{P}} \right) = 1- \frac{1}{\text{n}} = \frac{1}{2} $$

Finally for \( {\text{n }} > 2,\quad {\text{G}}({\hat{\text{P}}) = }\frac{{\text{3n}\text{ - }\text{4}}}{{\text{4n}- \text{4}}} < {\text{G}}\left( {\text{P}} \right) = \frac{{\text{n}- \text{1}}}{\text{n}} \) since for \( {\text{n }} = 3,{\text{ G}}({\hat{\rm {P}}) = }\frac{5}{8} < {\text{G}}\left( {\text{P}} \right) = \frac{2}{3} \) , and as \( {\text{n}} \to \infty ,{\text{ G}}(\hat{\text{P}}) \to \frac{3}{4},{\text{ but G}}\left( {\text{P}} \right) \to 1. \)

To consider this last case more generally, we examine the effects of the range of equi-distributed weights. When wi → 0, 1 < i ≤ n, Π → {1, 0,…0}, complete certainty, and we recall for which case we have seen that the conditioned distribution is more informative. Then if wi → 1, 1 < i ≤ n, Π → {1, 1,…1}, the case of complete uncertainty. So the conditioned probability distribution \( \hat{\text{P}} \) is no more informative than the original probability P for either measure.

3.4 Renyi entropy

Renyi (1961,1970) introduced a parameterized family of entropies as a generalization of Shannon entropy. The intention was to have the most general approach that preserved the additivity property and satisfied the probability axioms of Kolmogorov. Renyi entropy is

$$ {\text{S}}_{\alpha } \left( {\text{P}} \right) \, = \frac{1}{1 - \alpha } \times \text{In}\left( {\sum\limits_{{\text{i = 1}}}^{\text{n}} {\text{p}_{\text{i}}^{{\alpha}} } } \right) $$

3.5 Cases of the parameter α

\( \alpha = 0: {\text{ S}}\left( {\text{P}} \right) = { \ln }\left| {\text{P}} \right| \)—Hartley Entropy (Hartley 1928) \( \text{lim}\; \alpha \to 1: {\text{ S}}_{ 1} \left( {\text{P}} \right) = - \sum\limits_{{\text{i = 1}}}^{\text{n}} {{\text{p}}_{\text{i}} \times { \ln }\left( {{\text{p}}_{\text{i}} } \right)} \)—Shannon Entropy

$$ \alpha = 2: {\text{S}}_{ 2} \left( {\text{P}} \right) = - { \ln }\left( {\sum\limits_{{\text{i = 1}}}^{\text{n}} {\text{p}_{\text{i}}^{2} } } \right) - {\text{Collision or quadratic entropy}} $$
(7)
$$ \alpha \to \infty : \, {\text{S}}_{\infty } \left( {\text{P}} \right) = \mathop {\text{Min}}\limits_{{\text{i = 1}}}^{\text{n}} \left( { - {\text{ln p}}_{\text{i}} } \right) = - \mathop {\text{Max}}\limits_{{\text{i = 1}}}^{\text{n}} \left( {{\text{ln p}}_{\text{i}} } \right) $$
(8)

This last case is the smallest entropy in the Renyi family and so is the strongest way to obtain an information content measure. It is never larger than the Shannon entropy. Thus, the possible ranges of α capture the following:

  • High α: high probability events

  • Low α: weight possible events more equally

  • \( \alpha = 0, 1 \) ≥ Hartley or Shannon, respectively

The Hartley entropy is not of great interest here as for all of our cases here \( \left| {\text{P}} \right| = |\hat{\text{P}}| \), and we have already considered the Shannon entropy. Now we consider the values of the S2 measure, Eq. 7, for our two characteristic probabilities. For Pcc

$$ {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cc}} } \right) = - { \ln }\left( {{\text{p}}_{ 1} + \sum\limits_{{\text{i = 2}}}^{\text{n}} {{\text{p}}_{\text{i}}^{ 2} } } \right) = - { \ln }\left( { 1+ \sum\limits_{{\text{i = 2}}}^{\text{n}} {0^{2} } } \right) = - {\text{ln 1}} = 0 $$

and for

$$ {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cu}} } \right) = - { \ln }\left( {\sum\limits_{{\text{i = 1}}}^{\text{n}} {\text{p}_{\text{i}}^{\text{2}} } } \right) = - { \ln }\left( {\sum\limits_{{\text{i = 1}}}^{\text{n}} {\frac{1}{{\text{n}^{\text{2}} }}} } \right) = - { \ln }({\text{n}} \times \frac{1}{{\text{n}^{\text{2}} }}) = \, - \left( {{\text{ln 1}} - { \ln }\left( {\text{n}} \right)} \right) = { \ln }\left( {\text{n}} \right) $$

These are the same as the results for the Shannon entropy. To continue we calculate S2 for our IED example as we have done for the Shannon entropy and Gini index. So we have

$$ {\text{S}}_{ 2} \left( {{\text{P}}_{\text{IED}} } \right) = - { \ln }(0. 3^{ 2} + 0. 2^{ 2} + 0. 4^{ 2} + 0. 1^{ 2)} = - { \ln }\left( {0. 3} \right) = 1. 20 $$

and for the conditioned probability

$$ {\text{S}}_{ 2} (\hat{\text{P}}_{\text{IED}} ) = - { \ln }\left( {0. 3 9^{ 2} + 0. 1 6^{ 2} + 0. 4 2^{ 2} + 0.0 3^{ 2} } \right) = - { \ln }\left( {0. 3 5 6} \right) = 1.0 3 $$

Again as for the other two measures, the resulting value for \( \hat{\text{P}} \) is less than for P.We want to consider only briefly the effect of larger values for the parameter α. For example from Eq. 8 for S we have

$$ {\text{S}}_{\infty } \left( {{\text{P}}_{\text{IED}} } \right) = 0. 9 2> {\text{S}}_{\infty } (\hat{\text{P}}_{\text{IED}} ) = 0. 8 7 $$

This continues the evaluation of \( \hat{\text{P}}_{\text{IED}} \) being more informative but we note the difference is somewhat smaller.

Next we can utilize the already determined sum of the squared probabilities from the Gini measure to evaluate for the first three possibility cases for S2.

3.5.1 Case 1

For the probability distribution Pcc we see

$$ {\text{S}}_{ 2} (\hat{\text{P}}) = - { \ln }\left( { 1+ \sum\limits_{{\text{i = 2}}}^{\text{n}} {0^{2} } } \right) = - {\text{ln 1}} = 0 = {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cc}} } \right) $$

but for Pcu

$$ {\text{S}}_{ 2} (\hat{\text{P}}) = 0 < {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cu}} } \right) = { \ln }\left( {\text{n}} \right) $$

3.5.2 Case 2

Since \( \hat{\text{P}} = {\text{P}}, \)

$$ {\text{S}}_{ 2} (\hat{\text{P}}) = 0 = {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cc}} } \right){\text{ and S}}_{ 2} (\hat{\text{P}}) = { \ln }\left( {\text{n}} \right) = {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cu}} } \right) $$

3.5.3 Case 3

For the completely certain probability as before,

$$ {\text{S}}_{ 2} (\hat{\text{P}}) = 0 = {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cc}} } \right) $$

and for Pcu

$$ {\text{S}}_{ 2} (\hat{\text{P}}) = - { \ln }\left( {\sum\limits_{{\text{i = 1}}}^{\text{m}} {\hat{\text{p}}_{\text{i}}^{\text{2}} } + \sum\limits_{{\text{i = m + 1}}}^{\text{n}} {\hat{\text{p}}_{\text{i}}^{\text{2}} } } \right) = - { \ln }\left( {\sum\limits_{{\text{i = 1}}}^{\text{m}} {\left( {\frac{1}{\text{m}}} \right)^{2} + \sum\limits_{{\text{i = m + 1}}}^{\text{n}} 0 } } \right){ \ln }\left( {\text{m}} \right) $$

Again since m < n,

$$ {\text{S}}_{ 2} (\hat{\text{P}}) = { \ln }\left( {\text{m}} \right) < { \ln }\left( {\text{n}} \right) = {\text{S}}_{ 2} \left( {{\text{P}}_{\text{cu}} } \right). $$

So we note that values of S2 in these specific cases are the same as for the Shannon entropy measure; however, the exact numeric value obtained for example in Eq. 3 is not identical so we can conclude there is a close but not exact relationship between them.

3.6 Example: less informative conditioned probability

Next consider the following example for possibility and probability distributions in order to illustrate that not all \( \hat{\text{P}} \)'s are more informative than an initial probability P. We shall apply our previous information measures and see that these are consistent in their assessments. So let the possibility and probability distributions be:

$$ \varPi = \{ 0. 1, \, 0. 1,{ 1}.0, \, 0. 1);{\text{ P}} = \{ 0. 8, \, 0. 1, \, 0.0 5, \, 0.0 5) $$

As before we can compute \( \hat{\text{P}} \)

$$ {\text{K}} = 0.8 \times 0.1 + 0.1 \times 0.1 + 1.0 \times 0.05 + 0.1 \times 0.05 = 0.08 + 0.01 + 0.05 + 0.005 = 0.145 $$

and

$$ {\hat{\text{p}}}_{1} = 0. 8\times 0. 1/0. 1 4 5= 0. 5 5 2; \, \ldots {\hat{\text{p}}}_{4} = 0. 1\times 0.0 5/0. 1 4 5= 0.0 3 4 $$
$$ \hat{\text{P}} = \{ \, 0. 5 5 2, \, 0.0 6 9, \, 0. 3 4 5, \, 0.0 3 4) $$

We can see intuitively that there is some degree of conflict or lack of consistency between the possibility and probability distributions. For example for the largest probability, p1 = 0.8, the possibility is quite low, π1 = 0.1. Then where π3 = 1.0, we observe the corresponding probability, p3, is one of two lowest probability values, 0.05.

Now we can assess this situation with the information measures. Starting with Shannon entropy we have the smaller entropy for the initial probability distribution:

$$ {\text{S}}\left( {\text{P}} \right) = 0. 70 8< {\text{S}}(\hat{\text{P}}) = 0. 9 9 5 $$

Likewise we can see that the Gini index measure also yields a similar result for this case involving some degree of conflict indicating that the conditioned probability \( \hat{\text{P}} \)is less informative:

$$ {\text{G}}\left( {\text{P}} \right) = 1- 0. 6 5 5= 0. 3 4 5< {\text{G}}(\hat{\text{P}}) = 1- 0. 4 3= 0. 5 7 $$

Finally we obtain similar results for the Renyi entropies, S2 and S:

$$ {\text{S}}_{2} \left( {\text{P}} \right) = - \ln \left( {0.655} \right) = 0.408 < {\text{S}}_{2} ({\hat{\text{P}}}) = - \ln \left( {0.430} \right) = 0.844 $$
$$ {\text{S}}_{\infty } \left( {\text{P}} \right) = 0. 2 2< {\text{S}}_{\infty } (\hat{\text{P}}) = 0. 5 9 $$

So the information measures are compatible with our intuitive assessment of the conflict between between Π and P. In the next section when we discuss in some detail Zadeh’s consistency measure, we can observe this measure is also indicative of a lower consistency with these distributions.

4 Consistency evaluations of distributions

In this section, we use Zadah’s consistency measure as another approach to assess the integration of uncertainty representations as a supplement to the information measures of the previous section. We shall see that the measure yields evaluations compatible with the information measures of the previous section.

As noted by Sudkamp (1992), a probability–possibility transformation is a “purely mechanical manipulation of the distribution without regard to the underlying problem domain or evidence”. It does not by itself provide guidance of the usefulness of the outcome.

For example, reconsider the result of Theorem 1 with respect to the initial probability distribution. Let pk be a very low probability; i.e. represent a “rare” event, 0 < pk <<1; however, as we have seen \( \hat{\text{p}}_{\text{k}} = 1 \), which indicates that although the probability was very small, the corresponding event did actually occur in this particular instance based on the possibility distribution. Furthermore, if the initial probability pk was actually 0, then K = 0 and the conditional probability is ill defined as we have an indeterminate result: 0/0. Clearly, these results by themselves are unhelpful to decision making and we will see that the consistency measure reflects this.

There have been a number of approaches to consistency measures of probability and possibility distributions that have been proposed (Delgado and Moral 1987; Gupta 1993). As we discussed in the introduction, Zadeh’s approach,

$$ \text{C}_{\text{Z}} (\Pi ,\rm P) =\sum\limits_{{\text{i = 1}}}^{\text{n}} {{\pi}_{\text{i}} \times } \text{p}_{\text{i}} $$
(9)

is identical to the expression for K in the conditioned probability approach. This measure does not represent an inherent relationship but rather represents the intuition that a lowering of an event’s possibility tends to lower its probability, but not the converse.

Another consistency measure that appears in the literature, CDP (Π, P), is due to Dubois and Prade (1982, 1983). Here for every subset A of the space X,

$$ {\text{C}}_{\text{DP}} \left( {\varPi ,{\text{ P}}} \right) = 1 {\text{ if}}\,\varPi \left( {\text{A}} \right) \ge {\text{P}}\left( {\text{A}} \right) $$
(10)

and is 0 otherwise. This definition is based on the idea that possibility is the weaker representation for a situation than probability.

For our purposes, we focus here on CZ as it provides a range of values to evaluate the idea of consistency as it relates to the possibilistic conditioning approach. We can note that the maximum value that \( {\text{C}}_{\text{Z}} \left( {\Pi ,{\text{ P}}} \right) = \sum\nolimits_{{\text{i = 1}}}^{\text{n}} {{\pi}_{\text{i}} \times \text{ p}_{\text{i}} } \) can have in general is 1 as \( \sum\nolimits_{{\text{i = 1}}}^{\text{n}} {\text{p}_{\text{i}} \text{ = 1}} \) and πi is at most 1. Thus, the range of CZ is the interval [0…1] where 0 can be considered as complete inconsistency and 1 complete consistency. In a more general sense we can relate this to the concept of conflict, of which consistency is only one aspect. Conflict generally is thought of as involving broader semantic issues such source reliability and trustworthiness.

Thus, for the case in Theorem 1 where pk = 0 when πk = 1, evaluation of CZ yields

$$ \begin{aligned} \text{C}_{\text{Z}} \text{(}\Pi \text{,P)} & = {\pi}_{\text{k}} \times \text{p}_{\text{k}} + \sum\limits_{{\text{i} \ne \text{k}}}^{\text{n}} {{\pi}_{\text{i}} \times \text{p}_{\text{i}} } \\ &= 1 \times 0 + \sum\limits_{{\text{i} \ne \text{k}}}^{\text{n}} {\text{0*p}_{\text{i}} } \\ & = 0 \\ \end{aligned} $$

This result implies that these distributions are indeed inconsistent, i.e. a total conflict, and we should not expect a valid conditional probability distribution to be produced for such a situation. Resolution of such a conflict can be managed by considerations of semantic issues such as the reliability of the underlying information sources.

4.1 Zadeh’s consistency measure, Cz, for four possibility cases

We will consider comparative evaluation of the consistency cases in a following section some of which show a conflict. Note that for validity we can check to see that the conditioned distribution does indeed sum to 1 where there is no conflict.

Case 1:

For Π(1,0,…,0);

$$ {\text{C}}_{\text{Z}} \left( {\varPi ,{\text{ P}}_{\text{cc}} } \right) = 1\times 1+ \sum\limits_{{\text{i = 2}}}^{\text{n}} {0 \times 0 = 1;{\text{C}}_{\text{Z}} \left( {\varPi ,{\text{ P}}_{\text{cu}} } \right) = 1\times \frac{1}{\text{n}} + } \sum\limits_{{\text{i = 2}}}^{\text{n}} {0 \times } \frac{1}{\text{n}} = \frac{1}{\text{n}} $$

For Pcc, this result shows complete consistency that since only one probability, p1, was considered as possible. For Pcu, the measure indicates that there is some inconsistency with Pcu.

Case 2:

For case 2, Π(1, 1,…, 1), which is complete uncertainty, no distinctions are made relative to the probabilities and so both Pcc and Pcu are consistent with the possibility distribution.

$$ {\text{C}}_{\text{Z}} \left( {\varPi ,{\text{ P}}_{\text{cc}} } \right) = 1\times 1+ \sum\limits_{{\text{i = 2}}}^{\text{n}} { 1\times 0 = 1;{\text{ C}}_{\text{Z}} \left( {\varPi ,{\text{ P}}_{\text{cu}} } \right) = } \sum\limits_{{\text{i = 1}}}^{\text{n}} {1 \times } \frac{1}{\text{n}} = {\text{n}} \times \frac{1}{\text{n}} = 1. $$

Case 3, Pcc subcase 1; pt = 1; t \( \le \) m

With the intermediate possibility case 3 for Pcc where pt = 1 and t \( \le \) m we have

$$ {\text{C}}_{\text{Z}} \left( {\varPi ,{\text{ P}}_{\text{cc}} } \right) = 1\times 1+ \sum\limits_{{\text{i = 2}}}^{\text{m}} { 1\times 0 + \sum\limits_{{\text{i = m + 1}}}^{\text{n}} {0 \times 0 = 1;} } $$

Case 3, Pcc subcase 2; pt = 1; t > m

As noted in Sect. 2, there is a problem since πt = 0, but pt = 1. As a result, we have zero for the consistency measure,

$$ {\text{C}}_{\text{Z}} \left( {\varPi ,{\text{ P}}} \right) = \sum\limits_{{\text{i = 1}}}^{\text{m}} { 1\times 0 + \left( {{\text{p}}_{\text{t}} = { 1}} \right) \times \left( {\pi_{\text{t}} = 0} \right) + 0 \times 0 = 0.} $$

This result implies that these distributions are completely inconsistent or in conflict. Thus, no valid conditional probability distribution can be produced for such a situation.

Case 3, Pcu Complete uncertainty

$$ {\text{C}}_{\text{Z}} \left( {\varPi ,{\text{ P}}_{\text{cu}} } \right) = \sum\limits_{{\text{i = 1}}}^{\text{m}} { 1\times } \frac{1}{\text{n}} + \frac{\text{n}}{{\text{i = m + 1}}}\,0 \times \frac{1}{\text{n}} = \frac{\text{m}}{\text{n}} < { 1} $$

Similar to subcase 1 for Pcu, n − m of the original probabilities are not compatible with the possibility distribution as reflected in the consistency measure. That is the inconsistency here is due to the contrast in the n-m values of Π and Pcu.

Case 4, Pcc Subcase 1; t = 1, pt = 1:

Finally for the general possibility case, Case 4, where P = {1, 0,…, 0},

$$ {\text{C}}_{\text{Z}} = 1\times 1+ \sum\limits_{{\text{i = 2}}}^{\text{n}} {{\text{w}}_{\text{i}} \times 0 = 1.} $$

Case 4, Pcc Subcase 2; t > 1, pt = 1:

Since 0 ≤ wi < 1, we know not all probabilities are fully represented. Here, all wi > 0 so we do not have a conflict as in Case 3, Subcase 2 above since CZ (Π, P) = wt × pt = wt; 0 < wt < 1

Case 4, Pcu complete uncertainty

$$ {\text{C}}_{\text{Z}} = 1\times \frac{1}{\text{n}} + \sum\limits_{{\text{i = 2}}}^{\text{n}} {{\text{w}}_{\text{i}} \times } \frac{1}{\text{n}} < \frac{1}{\text{n}} + \frac{{\text{n} - \text{1}}}{\text{n}} = 1 $$

4.2 Consistency for example distributions

Next let us consider the consistency for the IED example. For these distributions, if we recall the value of K then we have

$$ {\text{C}}_{\text{Z}} (\varPi_{\text{IED}} ,{\text{ P}}_{\text{IED}} ) = {\text{K}} = 0. 3+ 0. 1 2+ 0. 3 2+ 0.0 2= 0. 7 6. $$

We have seen that the values of each of the three information measures we have evaluated for \( \hat{\text{P}} \) IED are less than their values for P IED. At issue is how specific values for consistency are related to the information measure values. Relative to the range of CZ, 0.76 is reasonably large. We can see next how this consistency value compares to the example of distributions for which the information assessments were shown to be less informative.

Consider again the possibility and probability distributions of Sect. 3.5 above. For these we observed that all the information measures indicated the conditioned probability \( \hat{\text{P}} \)was less informative than the initial probability P. For these distributions the consistency measure is

$$ {\text{C}}_{\text{Z}} (\varPi ,{\text{ P}}) = {\text{K}} = 0.0 8+ 0.0 1+ 0.0 5+ 0.00 5= 0. 1 4 5 $$

Clearly this consistency value is quite low compared to CZIED, P IED). So we can observe that higher consistency values are generally correlated with more informative conditioned probabilities.

Situations like this can occur in many applications. For example with web assistant agents, uncertainty aggregation appears in the integration of information from sources such as user profiles, proximity-based fuzzy clustering and knowledge-based discovery (Loia et al. 2006).

5 Summary

Decision makers are constantly faced with making choices in complex situations for which they have imperfect and often conflicting information. They have difficult decisions in making effective use of this information. Typically such a mix of information has a variety of associated uncertainty, but ultimately the decision maker must come to specific conclusions or actions based on this. Our research here has developed preliminary approaches to assist in this process by providing information theory based quantitative evaluations to guide decisions.

We have developed exact expressions for conditioned probability based on the extreme cases, completely certain and uncertain. For these cases three information measures were applied and yielded compatible results for comparing the informativeness of the original versus the conditioned probability. As well, we carried out the possibilistic conditioning and information evaluations for numeric examples. Additionally we used the Zadeh consistency measure and have seen it correlates well with the evaluation results.

We are currently doing research on the aggregations of both multiple possibility distributions and probability distributions. This will allow us to potentially take advantage of such additional information sources before the computing the conditioned probability. Also we are developing environments to carry out Monte Carlo simulations to test the conditioning approach and the evaluation measures. We are investigating ways to apply such simulations to actual decision-making and assess if more effective outcomes result when the evaluation measures have indicated that the conditioned probability is more informative.