Uncertainty measurement for interval-valued decision systems based on extended conditional entropy

doi:10.1016/j.knosys.2011.10.013

Knowledge-Based Systems

Volume 27, March 2012, Pages 443-450

https://doi.org/10.1016/j.knosys.2011.10.013 Get rights and content

Abstract

Uncertainty measures can supply new points of view for analyzing data and help us to disclose the substantive characteristics of data sets. Some uncertainty measures for single-valued information systems or single-valued decision systems have been developed. However, there are few studies on the uncertainty measurement for interval-valued information systems or interval-valued decision systems. This paper addresses the uncertainty measurement problem in interval-valued decision systems. An extended conditional entropy is proposed in interval-valued decision systems based on possible degree between interval values. Consequently, a concept called rough decision entropy is introduced to evaluate the uncertainty of an interval-valued decision system. Besides, the original approximation accuracy measure proposed by Pawlak is extended to deal with interval-valued decision systems and the concept of interval approximation roughness is presented. Experimental results demonstrate that the rough decision entropy measure and the interval approximation roughness measure are effective and valid for evaluating the uncertainty measurement of interval-valued decision systems. Experimental results also indicate that the rough decision entropy measure outperforms the interval approximation roughness measure.

Introduction

Rough set theory, originally proposed by Pawlak and discussed in greater detail in [1], [2], has become a popular approach for the joint management of uncertainty and vagueness and has been applied in many fields [3], [4], [5], [6], [7].

Pawlak [2] proposed two numerical measures accuracy and roughness to evaluate uncertainty of a rough set in information systems, as well as approximation accuracy of a rough classification in decision systems. Some efforts were attracted to extend the Pawlak’s uncertainty model. Based on granulation, a measurement of uncertainty of a set in an information system and approximation accuracy of a rough classification in a decision table was proposed in [8].

Information entropy, proposed by Shannon [9] in information theory, has been an effective and powerful mechanism for characterizing the information content in diverse models. The measurement of uncertain information by entropy has been deployed in a wide range of fields. The extension of entropy and its variants were adapted for rough set in [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21]. For example, Düntsch and Gediga defined the information entropy and three kinds of conditional entropies in rough sets for predicting a decision attribute [12]. Beaubouef et al. [13] proposed a method measuring uncertainty of rough sets and rough relation databases based on rough entropy. Wierman [11] presented the measures of uncertainty and granularity in rough set theory, along with an axiomatic derivation. Yao et al. [14] studied several kinds of information-theoretical measures for attribute importance in rough set theory. Liang et al. [16] proposed a new method for evaluating both uncertainty and fuzziness. Qian and Liang [19] proposed a combination entropy for evaluating uncertainty of a knowledge from an information system. However, the methods mentioned above are based on single-valued information systems.

Interval-valued information systems (or Interval information systems) are an important type of data tables, and generalized models of single-valued information systems [22]. Several authors have studied about interval-valued information systems and interval-valued decision systems [22], [23], [24], [25], [26]. Yao et al. [23], [24] presented a model for the interval set by using the lower and upper approximations in interval-valued information systems, as well as introduced the generalized decision logic. Leung et al. [26] investigated a rough set approach to discover classification rules through a process of knowledge induction which selects decision rules with a minimal set of features in interval-valued information systems. Qian et al. [22] proposed a dominance relation to interval information systems. Yang et al. [25] presented a dominance relation and generated the optimal decision rules in incomplete interval-valued information system. Wu and Liu [27] introduced the real formal concept analysis about grey-rough set theory by using grey numbers, and proposed a grey-rough set approach to Galois lattices reductions. So far, however, there are few studies on the uncertainty measurement issue for interval-valued information systems (corresponding to unsupervised learning) or interval-valued decision systems (corresponding to supervised learning). In this paper, we address the uncertainty measurement issue in interval-valued decision systems and intend to construct effective uncertainty measures for interval-valued decision system. A similarity relation based on possible degree between two interval numbers is given, under which the concept of extended conditional entropy is proposed. Based on the proposed concept of conditional entropy, a measure of uncertainty for interval-valued decision systems called rough decision entropy is presented. Besides, the original approximation accuracy measure proposed by Pawlak is extended to deal with interval-valued decision systems and the concept of interval approximation roughness is presented. Experimental results demonstrate that the rough decision entropy measure and the interval approximation roughness measure are effective and valid for evaluating the uncertainty measurement of interval-valued decision systems. Experimental results also indicate that the rough decision entropy measure outperforms the interval approximation roughness measure.

This rest of the paper is organized as follows. Some basic concepts and notations in rough set theory are introduced in Section 2. In Section 3, serval key concepts of our method in interval-valued decision systems are illustrated in detail, including similarity degree, θ-conditional entropy, as well as θ-rough decision entropy. Some illustrative examples are also given. Simulation experiments are conducted to test and verify the effectiveness of the proposed measure in Section 4. Section 5 concludes the paper.

Section snippets

Preliminary knowledge

At first, some basic concepts in rough set theory are reviewed, including decision system, indiscernibility relation and approximation regions.

Similarity relation between two intervals

Ranking interval values are quite different from ranking real values [28], [29], [30]. In this section, a similarity measure based on possible degree is constructed to estimate two interval values.

Definition 1

[28]

Let A = [a⁻, a⁺] and B = [b⁻, b⁺] be two interval values. The possible degree of interval valued A relative to interval value B is defined as: $P_{(A ⩾ B)} = \min \{1, \max \{\frac{a^{+} - b^{-}}{(a^{+} - a^{-}) + (b^{+} - b^{-})}, 0\}\}$ P_(A⩾B) can be viewed as the possible degree of interval valued A greater than interval value B.

It is worth noting that P_(A⩾B) ≠ P_(B⩾

Experiments

To test and verify the effectiveness and the validity of the proposed uncertainty measure, some experiments are conducted on different interval-valued decision systems. These interval-valued decision systems are shown in Table 2, Table 3, Table 4 respectively.

There are 10 objects and 6 condition attributes in the interval-valued decision system shown in Table 2. There are 15 objects and 8 condition attributes in the interval-valued decision system shown in Table 3. There are 20 objects and 9

Conclusions

In this paper, we have studied the uncertainty measurement problem in interval-valued decision systems. Extended conditional entropy is proposed in interval-valued decision systems based on possible degree between interval values. Consequently, a concept called rough decision entropy is introduced to measure the uncertainty of an interval-valued decision system. Besides, the original approximation accuracy measure proposed by Pawlak is extended to deal with interval-valued decision systems and

Acknowledgements

The work is supported by the National Natural Science Foundation of China (Nos. 61070074 and 60703038) and the Excellent Young Teachers Program of Zhejiang University.

References (30)

T. Herawan et al.
A rough set approach for selecting clustering attribute
Knowledge-Based Systems
(2010)
P. Yang et al.
Finding key attribute subset in dataset for outlier detection
Knowledge-Based Systems
(2011)
Y.H. Qian et al.
Positive approximation: an accelerator for attribute reduction in rough set theory
Artificial Intelligence
(2010)
J.-Y. Shyng et al.
An integration method combining rough set theory with formal concept analysis for personal investment portfolios
Knowledge-Based Systems
(2010)
F. Min et al.
Test-cost-sensitive attribute reduction
Information Sciences
(2011)
J.Y. Liang et al.
A new measure of uncertainty based on knowledge granulation for rough sets
Information Sciences
(2009)
I. Düntsch et al.
Uncertainty measures of rough set prediction
Artificial Intelligence
(1998)
T. Beaubouef et al.
Information-theoretic measures of uncertainty for rough sets and rough relational databases
Information Sciences
(1998)
Y.H. Qian et al.
Knowledge structure, knowledge granulation and knowledge distance in a knowledge base
International Journal of Approximate Reasoning
(2009)
Y.H. Qian et al.
Interval ordered information systems
Computers and Mathematics with Applications
(2008)

X. Yang et al.

Dominance-based rough set approach to incomplete interval-valued information system

Data and Knowledge Engineering

(2009)

Y. Leung et al.

A rough set approach for the discovery of classification rules in interval-valued information systems

International Journal of Approximate Reasoning

(2008)

Q. Wu et al.

Real formal concept analysis based on grey-rough set theory

Knowledge-Based Systems

(2009)

Y. Nakahara et al.

On the linear programming problems with interval coefficients

Computers and Industrial Engineering

(1992)

Z. Pawlak

Rough sets

International Journal of Computer Information Science

(1982)

Cited by (143)

New uncertainty measurement for hybrid data and its application in attribute reduction
2024, Information Sciences
Due to limitations in data acquisition, data in real life often contains a wealth of uncertain information. Uncertainty measurement (UM) constructed within the framework of rough set theory (RST) is an important tool for processing uncertain information. Some basic UMs in RST such as classification precision, rough membership degree, dependence degree, and attribute importance cannot accurately measure the uncertainty of a hybrid information system (HIS). For example, dependence degree only considers the information provided by the lower approximation of the decision and ignores the upper approximation, which may lead to some information loss. In addition to these basic UMs, some extended entropy-based UMs such as rough entropy, information entropy and conditional entropy are also frequently used to measure the uncertainty of a HIS. However, these three UMs also have their own drawbacks. For instance, rough entropy is sensitive to the distribution of hybrid data. When the distribution of hybrid data is uneven, the calculation results of rough entropy may be greatly affected, leading to a decrease in measurement accuracy. This paper proposes four new UMs in a HIS and provides an application in attribute reduction. First of all, a distance function is defined to deal with each type of attribute in a HIS and construct a tolerance relation. On this basis, four UMs are listed to measure the uncertainty of a HIS. Next, the strength and weakness of the proposed UMs are verified by statistical analysis. Subsequently, the UM with the best performance is selected to design an attribute reduction algorithm. Finally, the designed algorithm is compared with other five attribute reduction algorithms to show its superior performance.
A novel method to information fusion in multi-source incomplete interval-valued data via conditional information entropy: Application to mutual information entropy based attribute reduction
2024, Information Sciences
In the era of explosive data growth, data sources and volumes are rapidly increasing. A multi-source data refers to information from multi-sources. However, not every source of information is equally important; some sources are more important and some are essentially worthless. Therefore, it is very meaningful to study how to select the most valuable sources and to efficiently fuse information. Multi-source incomplete interval-valued data (MSIIV-data) is an important kind of multi-source data. This paper proposes a novel method to information fusion in MSIIV-data via conditional information entropy (CIE) and considers its application to attribute reduction based on mutual information entropy. First, the distance between two information values for each incomplete interval-valued data is defined, the neighborhood classes with a tunable parameter are obtained, and the neighborhood granularity structure is established. Then, a source selection method is given via CIE, which is used to fuse MSIIV-data into single-source incomplete interval-valued data (SSIIV-data). Based on the minimization of CIE, this method allows worthy and reliable information sources to be chosen. Moreover, an attribute reduction algorithm (denoted as MMQPSO) for the fused SSIIV-data is proposed by means of combining mutual information entropy and QPSO-algorithm. Finally, experiments are done to validate the effectiveness of the proposed algorithms. The results of experiment and statistical test on 12 datasets show that the proposed algorithms have certain feasibility and advancement than 6 other advanced algorithms.
Feature selection based on fuzzy combination entropy considering global and local feature correlation
2024, Information Sciences
Feature selection is a commonly employed method to decrease data processing complexity by discarding unnecessary and repetitive features. An effective feature selection method can mitigate the challenges posed by high-dimensional data, save computing resources and improve learning performance. Combination entropy is a useful tool for assessing feature uncertainty, which provides an intuitive representation of the amount of information. However, classical combination entropy is difficult to be directly used for continuous features. Therefore, we propose the concept of fuzzy combination entropy. Moreover, we put forward an importance metric that comprehensively considers global feature correlation and local feature correlation. Firstly, the fuzzy combination entropy (FCE) is presented based on the fuzzy λ-similarity relation. Secondly, by combining the benefits of fuzzy rough sets and combination entropy, fuzzy combination entropy and its variants are constructed, and their related properties are also discussed. Thirdly, the concepts of global feature correlation and local feature correlation are defined and an importance metric is proposed. Finally, a feature selection method according to fuzzy combination entropy considering global feature correlation and local feature correlation (FSmFCE) is designed. According to the findings from our experiments, it is evident that our algorithm demonstrates a preference for selecting a smaller feature set, yet still achieves commendable classification performance.
Feature selections based on three improved condition entropies and one new similarity degree in interval-valued decision systems
2023, Engineering Applications of Artificial Intelligence
Feature selections facilitate classification learning in various data environments. Aiming at interval-valued decision systems (IVDSs), feature selections rely on information measures and similarity degrees, whereas current selection algorithms on credibility-based condition entropy and classical similarity degree are accompanied with some measurement limitations and advancement space. In this paper based on IVDSs, three coverage-credibility-based condition entropies and one geometry-probabilistic similarity degree are proposed across two dimensions of informationization and granulation, and they improve the existing condition entropy and similarity degree; accordingly, 4 × 2 feature selections emerge for optimization and applicability, and they systematically contain one initial selection algorithm and seven new/robuster algorithms. At first, three-way granular measures (i.e., credibility, coverage, and integrated coverage-credibility) are formulated in IVDSs, and three novel condition entropies are established by implementing three information structures on coverage-credibility. These condition entropies acquire in-depth improvements, hierarchical algorithms, size relationships, maximum/minimum conditions, and granulation non-monotonicity. Then, the probabilistic similarity degree is defined by a six-piecewise function with quadratic factors, and this new measure gains the geometry-probability mechanism and high-quality improvement. Furthermore, feature selections are determined by preserving condition entropies and by mining feature significances, so eight selection algorithms are obtained by combining condition entropies and similarity degrees. Finally, data experiments are performed to validate relevant uncertainty measures and feature selections, and seven constructional selection algorithms outperform three contrastive algorithms to achieve better classification performances.
Feature selection in a neighborhood decision information system with application to single cell RNA data classification
2021, Applied Soft Computing
A neighborhood information system (NIS) deals with an information system (IS) by means of neighborhoods. Sometimes it has some advantages over an IS. A neighborhood decision information system (NDIS) means a NIS with decision attributes. Single cell RNA (scRNA) data possess the characteristics of high dimensionality, small sample, unbalanced distribution, big noise and high redundancy. It has become an important research topic to select suitable and effective genes. This paper studies feature selection in a NDIS and considers its application for scRNA data classification. We first give the distance between information values on each attribute in a NDIS. Then, we present tolerance relations on the object set of a NDIS based on this distance. Next, we define the rough approximations in a NDIS by means of the presented tolerance relations. Furthermore, we put forward the notions of $δ$ -dependence degree, $δ$ -information entropy, $δ$ -conditional information entropy and $δ$ -joint information entropy in a NDIS. Based on Kryszkiewicz’s ideal, we introduce $δ$ -generalized decision and consider feature selection in a consistent NDIS by decision. Finally, we study feature selection in a consistent NDIS by using dependence degree and information entropy, and design the relevant algorithms. The experimental results conducted several scRNA data demonstrate that the designed algorithms possess excellent performance.
Information-theoretic measures of uncertainty for interval-set decision tables
2021, Information Sciences
Citation Excerpt :
Considering that the set-valued attributes are the special cases of interval-set-valued attributes, we only investigate the interval-set-valued attributes in this paper. Referring to Fig. 1, there are two types of interval decision tables, namely, interval-valued decision tables [6,30] and interval-set decision tables [16,39]. At present, various research problems in interval-valued decision tables have been widely studied, such as uncertainty measurements [5,28], attribute reduction [6], incremental updating approximations [47] and so on.
Uncertainty measurement is considered as a vital quantitative way for analyzing and mining potential characteristic features in different types of decision tables. However, considering the equivalent relation is not suitable for evaluating the relationships of objects, few studies focused on the interval-set decision tables. In this paper, we address the uncertainty measurement problem in interval-set decision tables. Firstly, a similarity relation is induced by the similarity degree. Based on the similarity relation, a notion of granular structure is defined and the corresponding properties are investigated in interval-set decision tables. Secondly, we extend the accuracy and the roughness, called the interval approximation accuracy and the interval approximation roughness, to measure the uncertainty under the granular structures. By the analysis of the two extended measures, they can effectively evaluate the uncertainty caused by the approximations in the rough set model. Considering that the size of similarity classes can also affect the uncertainty, an alternative uncertainty measure based on the conditional information entropy, called the interval-decision entropy, is proposed. Moreover, a definition of reduct based on our proposed measure is provided and a heuristic attribute reduction algorithm is designed. Finally, numerical experiments demonstrate that the proposed uncertainty measures are effective and suitable for interval-set decision tables.

View all citing articles on Scopus

View full text

Uncertainty measurement for interval-valued decision systems based on extended conditional entropy

Abstract

Introduction

Section snippets

Preliminary knowledge

Similarity relation between two intervals

[28]

Experiments

Conclusions

Acknowledgements

Knowledge-Based Systems

Knowledge-Based Systems

Artificial Intelligence

Knowledge-Based Systems

Information Sciences

Information Sciences

Artificial Intelligence

Information Sciences

International Journal of Approximate Reasoning

Computers and Mathematics with Applications

Data and Knowledge Engineering

International Journal of Approximate Reasoning

Knowledge-Based Systems

Computers and Industrial Engineering

Rough sets

International Journal of Computer Information Science