Elsevier

Knowledge-Based Systems

Volume 191, 5 March 2020, 105251
Knowledge-Based Systems

On modeling similarity and three-way decision under incomplete information in rough set theory

https://doi.org/10.1016/j.knosys.2019.105251Get rights and content

Abstract

Although incomplete information is a well studied topic in rough set theory, there still does not exist a general agreement on the semantics of various types of incomplete information. This has led to some confusions and many definitions of similarity or tolerance relations on a set of objects, without a sound of semantical justification. The main objective of this paper is to address semantics issues related to incomplete information. We present a four-step model of Pawlak rough set analysis, in order to gain insights on how an indiscernibility relation (i.e., an equivalence relation) is defined and used under complete information. The results enable us to propose a conceptual framework for studying the similarity of objects under incomplete information. The framework is based on a classification of four types of incomplete information (i.e., “do-not-care value”, “partially-known value”, “class-specific value”, and “non-applicable value”) and two groups of methods (i.e., relation-based and granule-based methods) for modeling similarity. We examine existing studies on similarity and their relationships. In spite of their semantics differences, all four types of incomplete information can be uniformly represented in a set-valued table. We are therefore able to have a common conceptual possible-world semantics. Finally, to demonstrate the value of the proposed framework, we examine three-way decisions under incomplete information.

Introduction

Rough set theory, proposed by Pawlak [1], is an effective tool for deriving decision rules from data. A fundamental notion of rough set based data analysis is an indiscernibility relation on a set of objects [2]. If two objects have the same values over a set of attributes, they are indiscernible with respect to these attributes [2]. Objects with the same description form an equivalence class and the family of all equivalence classes is a partition of the universe. By using equivalence classes as basic building blocks, one can construct approximations of a subset of objects in terms of three pair-wise disjoint positive, negative, and boundary regions [1]. Yao [3], [4], [5], [6] introduced a theory of three-way decision as thinking and processing in threes. Interpreting rules in rough set theory in terms of three regions is an example of three-way decision. Specifically, from the three regions, one can derive acceptance, rejection, and non-commitment rules for making three-way decisions.

An assumption of Pawlak rough set analysis is that an object takes exactly one value on each attribute and, furthermore, we know this value. However, in many situations available information about some objects may be incomplete and we may not know their actual values on some attributes. In addition, it may be necessary to consider two categories of values: “applicable value” and “non-applicable value”. For the category of applicable values, the actual values must exist, but we may not know the values or only know a range of possibilities. For the category of non-applicable values, some attributes are not applicable to certain objects and hence their values cannot be stated. It may be viewed as special type of missing value. Under these circumstances of incomplete information, we may not know the exact descriptions of some objects and the notion of equivalence relations is no longer appropriate. Many authors propose and investigate different types of non-equivalence relation to model similarity, including tolerance relations [7], similarity relations [8], conditional tolerance relations [9], characteristic relations [10], [11], [12], [13], [14], [15], [16]. Indiscernibility is a special type of similarity. An indiscernibility relation is essential for deriving rules with complete information; a similarity relation plays the same essential role for deriving rules under incomplete information. Different types of similarity relations models are based on different semantics of incomplete information. However, there does not exist a conceptual framework for studying incomplete information from semantics point of view.

Kryszkiewicz [7] considers incomplete information as “do-not-care value” that may be replaced by any known values of an attribute. Stefanowski and Tsoukiàs [8] consider two types of incomplete information: “missing value” and “absent value”. The “missing value” semantics allows comparison operations on a missing value. The “absent value” semantics does not allow any comparison. Grzymala-Busse [11], [13] considers two types of incomplete information: “do-not-care value” and “lost value”. He further divides “do-not-care value” into three categories according to their comparison ranges: “do-not-care value”, “restricted do-not-care value”, and “attribute-concept value”. For a “do-not-care value”, it may be replaced by any known values of the attribute. For a “restricted do-not-care value”, it may only be replaced by any known values of the attribute excluding “lost values”. For an “attribute-concept value”, it may be replaced by any known values that are limited to the same concept. For a “lost value”, its original value existed but for a variety of reasons now it is not accessible.

Based on the existing studies of different semantics of incomplete information, we summarize four types of semantics of incomplete information: “do-not-care value”, “partially-known value”, “class-specific value”, and “non-applicable value”. Lipski [17], [18] presents a possible-world semantics to discuss incomplete information in databases. We adopt the possible-world semantics to study different types of incomplete information tables based on the four types of semantics. An incomplete information table is characterized by a family of complete information tables. Each complete information table in the family corresponds to a candidate of the actual table in one possible world and only one of the complete information tables is the actual table if information is complete. We unify different definitions of similarity relations by transforming an incomplete information table into a set-valued table, which allows a common possible-world semantics.

To demonstrate the value of the proposed framework, we discuss three-way decisions in an incomplete information table based on rough set approximations defined by a similarity relation. It is essentially a generalization of three-way decision with Pawlak rough sets [3]. Three-way decisions are inspired by a common practice of human decision-making with three options, namely, acceptance, rejection, and non-commitment. There are many studies on the theory and practice applications of three-way decisions [3], [4], [5], [19], [20], [21], [22], [23]. Three-way decision under incomplete information extends potential applications of standard rough sets and is worthy of further investigation.

This paper focuses on a conceptual modeling of similarity. Two important computational and practical issues are not covered, which shows two limitations of this paper. The first issue is about efficient algorithms for constructing a similarity relation. The second issue is the selection of a most suitable type of similarity relations in a particular application. Different applications may require different types of similarity relations. How to choose a particular type of similarity relations and how to efficiently compute a similarity relation in an application are interesting problems for further study.

The rest of this paper is organized as follows. Section 2 gives a four-step model of Pawlak rough set analysis under complete information. Section 3 presents four types of semantics of incomplete information for defining different classes of incomplete information tables. Section 4 studies the different definitions of similarity relations and similarity classes and discusses their relationships. Section 5 discusses three-way decisions in an incomplete information table.

Section snippets

Equivalence of objects under complete information

Pawlak rough set analysis (RSA) offers a unique approach to classification problems based on the notions of discernibility and indiscernibility of objects according to their descriptions. Fig. 1 presents a four-step framework for a conceptual understanding of RSA. It basically follows from the seminal book by Pawlak [2] with some slight modifications.

As shown in Fig. 1, the input of RSA is an information table that describes a set of objects by using a set of attributes.

Definition 2.1

An information table is

A framework for modeling similarity of objects under incomplete information

By reviewing the RSA in Section 2, we can find that the notion of indiscernibility is one of the central concepts. In many practical applications, the available information may be incomplete. We divide the incomplete information into two categories of “applicable values” and “non-applicable values”. Applicable values must exist, but we may not know the value or only know a range of possibilities. In some cases, an attribute may not be applicable to some objects. For example, an attribute

Two methods to similarity

Similarity plays an essential role in rule acquisition in an incomplete information table. A similarity relation is in one-to-one correspondence with a family of similarity classes. In this section, we discuss two methods in Fig. 3 to define similarity relations and similarity classes and study the relationships among different similarities.

Three-way decision based on similarity

In rough set theory under complete information, three-way approximations of a subset of objects, i.e., the positive, negative, and boundary regions, serve as a basis for three-way decisions. We formulate acceptance rules, rejection rules, and non-commitment rules, respectively, from the three regions [3], [4], [5], [19], [20], [21], [22], [23]. In this section, we extend the ideas of the three-way decision to situations with incomplete information.

In an incomplete information table, the

Conclusions

This paper focuses on the different interpretations and definitions of similarity based on the different semantics of incomplete information. We summarize four types of semantics of incomplete information and present a general definition of an incomplete information table. We identify two methods to study similarity relations and similarity classes in an incomplete information table. By transferring an incomplete information table into a set-valued table, we uniformly study the relationships

Acknowledgments

The authors thank reviewers for their constructive comments. This work was supported in part by the National Natural Science Foundation of China (Grant No. 61473239), the China Scholarship Council (Grant No. 201707000052), and a Discovery Grant from NSERC, Canada .

References (47)

Cited by (68)

  • Formal concept analysis perspectives on three-way conflict analysis

    2023, International Journal of Approximate Reasoning
View all citing articles on Scopus

No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.105251.

View full text