On modeling similarity and three-way decision under incomplete information in rough set theory☆
Introduction
Rough set theory, proposed by Pawlak [1], is an effective tool for deriving decision rules from data. A fundamental notion of rough set based data analysis is an indiscernibility relation on a set of objects [2]. If two objects have the same values over a set of attributes, they are indiscernible with respect to these attributes [2]. Objects with the same description form an equivalence class and the family of all equivalence classes is a partition of the universe. By using equivalence classes as basic building blocks, one can construct approximations of a subset of objects in terms of three pair-wise disjoint positive, negative, and boundary regions [1]. Yao [3], [4], [5], [6] introduced a theory of three-way decision as thinking and processing in threes. Interpreting rules in rough set theory in terms of three regions is an example of three-way decision. Specifically, from the three regions, one can derive acceptance, rejection, and non-commitment rules for making three-way decisions.
An assumption of Pawlak rough set analysis is that an object takes exactly one value on each attribute and, furthermore, we know this value. However, in many situations available information about some objects may be incomplete and we may not know their actual values on some attributes. In addition, it may be necessary to consider two categories of values: “applicable value” and “non-applicable value”. For the category of applicable values, the actual values must exist, but we may not know the values or only know a range of possibilities. For the category of non-applicable values, some attributes are not applicable to certain objects and hence their values cannot be stated. It may be viewed as special type of missing value. Under these circumstances of incomplete information, we may not know the exact descriptions of some objects and the notion of equivalence relations is no longer appropriate. Many authors propose and investigate different types of non-equivalence relation to model similarity, including tolerance relations [7], similarity relations [8], conditional tolerance relations [9], characteristic relations [10], [11], [12], [13], [14], [15], [16]. Indiscernibility is a special type of similarity. An indiscernibility relation is essential for deriving rules with complete information; a similarity relation plays the same essential role for deriving rules under incomplete information. Different types of similarity relations models are based on different semantics of incomplete information. However, there does not exist a conceptual framework for studying incomplete information from semantics point of view.
Kryszkiewicz [7] considers incomplete information as “do-not-care value” that may be replaced by any known values of an attribute. Stefanowski and Tsoukiàs [8] consider two types of incomplete information: “missing value” and “absent value”. The “missing value” semantics allows comparison operations on a missing value. The “absent value” semantics does not allow any comparison. Grzymala-Busse [11], [13] considers two types of incomplete information: “do-not-care value” and “lost value”. He further divides “do-not-care value” into three categories according to their comparison ranges: “do-not-care value”, “restricted do-not-care value”, and “attribute-concept value”. For a “do-not-care value”, it may be replaced by any known values of the attribute. For a “restricted do-not-care value”, it may only be replaced by any known values of the attribute excluding “lost values”. For an “attribute-concept value”, it may be replaced by any known values that are limited to the same concept. For a “lost value”, its original value existed but for a variety of reasons now it is not accessible.
Based on the existing studies of different semantics of incomplete information, we summarize four types of semantics of incomplete information: “do-not-care value”, “partially-known value”, “class-specific value”, and “non-applicable value”. Lipski [17], [18] presents a possible-world semantics to discuss incomplete information in databases. We adopt the possible-world semantics to study different types of incomplete information tables based on the four types of semantics. An incomplete information table is characterized by a family of complete information tables. Each complete information table in the family corresponds to a candidate of the actual table in one possible world and only one of the complete information tables is the actual table if information is complete. We unify different definitions of similarity relations by transforming an incomplete information table into a set-valued table, which allows a common possible-world semantics.
To demonstrate the value of the proposed framework, we discuss three-way decisions in an incomplete information table based on rough set approximations defined by a similarity relation. It is essentially a generalization of three-way decision with Pawlak rough sets [3]. Three-way decisions are inspired by a common practice of human decision-making with three options, namely, acceptance, rejection, and non-commitment. There are many studies on the theory and practice applications of three-way decisions [3], [4], [5], [19], [20], [21], [22], [23]. Three-way decision under incomplete information extends potential applications of standard rough sets and is worthy of further investigation.
This paper focuses on a conceptual modeling of similarity. Two important computational and practical issues are not covered, which shows two limitations of this paper. The first issue is about efficient algorithms for constructing a similarity relation. The second issue is the selection of a most suitable type of similarity relations in a particular application. Different applications may require different types of similarity relations. How to choose a particular type of similarity relations and how to efficiently compute a similarity relation in an application are interesting problems for further study.
The rest of this paper is organized as follows. Section 2 gives a four-step model of Pawlak rough set analysis under complete information. Section 3 presents four types of semantics of incomplete information for defining different classes of incomplete information tables. Section 4 studies the different definitions of similarity relations and similarity classes and discusses their relationships. Section 5 discusses three-way decisions in an incomplete information table.
Section snippets
Equivalence of objects under complete information
Pawlak rough set analysis (RSA) offers a unique approach to classification problems based on the notions of discernibility and indiscernibility of objects according to their descriptions. Fig. 1 presents a four-step framework for a conceptual understanding of RSA. It basically follows from the seminal book by Pawlak [2] with some slight modifications.
As shown in Fig. 1, the input of RSA is an information table that describes a set of objects by using a set of attributes.
Definition 2.1 An information table is
A framework for modeling similarity of objects under incomplete information
By reviewing the RSA in Section 2, we can find that the notion of indiscernibility is one of the central concepts. In many practical applications, the available information may be incomplete. We divide the incomplete information into two categories of “applicable values” and “non-applicable values”. Applicable values must exist, but we may not know the value or only know a range of possibilities. In some cases, an attribute may not be applicable to some objects. For example, an attribute
Two methods to similarity
Similarity plays an essential role in rule acquisition in an incomplete information table. A similarity relation is in one-to-one correspondence with a family of similarity classes. In this section, we discuss two methods in Fig. 3 to define similarity relations and similarity classes and study the relationships among different similarities.
Three-way decision based on similarity
In rough set theory under complete information, three-way approximations of a subset of objects, i.e., the positive, negative, and boundary regions, serve as a basis for three-way decisions. We formulate acceptance rules, rejection rules, and non-commitment rules, respectively, from the three regions [3], [4], [5], [19], [20], [21], [22], [23]. In this section, we extend the ideas of the three-way decision to situations with incomplete information.
In an incomplete information table, the
Conclusions
This paper focuses on the different interpretations and definitions of similarity based on the different semantics of incomplete information. We summarize four types of semantics of incomplete information and present a general definition of an incomplete information table. We identify two methods to study similarity relations and similarity classes in an incomplete information table. By transferring an incomplete information table into a set-valued table, we uniformly study the relationships
Acknowledgments
The authors thank reviewers for their constructive comments. This work was supported in part by the National Natural Science Foundation of China (Grant No. 61473239), the China Scholarship Council (Grant No. 201707000052), and a Discovery Grant from NSERC, Canada .
References (47)
Three-way decisions with probabilistic rough sets
Inform. Sci.
(2010)Three-way decision and granular computing
Internat. J. Approx. Reason.
(2018)Three-way conflict analysis: reformulations and extensions of the Pawlak model
Knowl.-Based Syst.
(2019)Rough set approach to incomplete information systems
Inform. Sci.
(1998)- et al.
Dependence-space-based attribute reductions in inconsistent decision information systems
Internat. J. Approx. Reason.
(2008) - et al.
Maximal consistent block technique for rule acquisition in incomplete information systems
Inform. Sci.
(2003) - et al.
Generalized probabilistic approximations of incomplete data
Internat. J. Approx. Reason.
(2014) - et al.
Updating three-way decisions in incomplete multi-scale information systems
Inform. Sci.
(2019) - et al.
Three-way decision approaches to conflict analysis using decision-theoretic rough set theory
Inform. Sci.
(2017) - et al.
A sequential three-way approach to multi-class decision
Internat. J. Approx. Reason.
(2019)
A temporal-spatial composite sequential approach of three-way granular computing
Inform. Sci.
The two sides of the theory of rough sets
Knowl.-Based Syst.
Structured approximations as a basis for three-way decisions in rough set theory
Knowl.-Based Syst.
Structured probabilistic rough set approximations
Internat. J. Approx. Reason.
Three-way decision perspectives on class-specific attribute reducts
Inform. Sci.
Rule acquisition and complexity reduction in formal decision contexts
Internat. J. Approx. Reason.
Approximate concept construction with three-way decisions and attribute reduction in incomplete contexts
Knowl.-Based Syst.
Generalized approximations defined by non-equivalence relations
Inform. Sci.
Rough classification in incomplete information systems
Math. Comput. Modelling
Set-valued information systems
Inform. Sci.
Set-valued ordered information systems
Inform. Sci.
Fuzzy rough set model for set-valued data
Fuzzy Sets and Systems
Representation of nondeterministic information
Theoret. Comput. Sci.
Cited by (68)
Three-way group decision based on regret theory under dual hesitant fuzzy environment: An application in water supply alternatives selection
2024, Expert Systems with ApplicationsPartially-defined equivalence relations: Relationship with orthopartitions and connection to rough sets
2024, Information SciencesA sequential three-way classification model based on risk preference and decision correction[Formula presented]
2023, Applied Soft ComputingFormal concept analysis perspectives on three-way conflict analysis
2023, International Journal of Approximate Reasoning
- ☆
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.105251.