Elsevier

Information Sciences

Volume 190, 1 May 2012, Pages 1-16
Information Sciences

A comparative study of rough sets for hybrid data

https://doi.org/10.1016/j.ins.2011.12.006Get rights and content

Abstract

To discover knowledge from hybrid data using rough sets, researchers have developed several fuzzy rough set models and a neighborhood rough set model. These models have been applied to many hybrid data processing applications for a particular purpose, thus neglecting the issue of selecting an appropriate model. To address this issue, this paper mainly concerns the relationships among these rough set models. Investigating fuzzy and neighborhood hybrid granules reveals an important relationship between these two granules. Analyzing the relationships among rough approximations of these models shows that Hu’s fuzzy rough approximations are special cases of neighborhood and Wang’s fuzzy rough approximations, respectively. Furthermore, one-to-one correspondence relationships exist between Wang’s fuzzy and neighborhood rough approximations. This study also finds that Wang’s fuzzy and neighborhood rough approximations are cut sets of Dubois’ fuzzy rough approximations and Radzikowska and Kerre’s fuzzy rough approximations, respectively.

Introduction

In real world databases, data sets usually take on hybrid forms, i.e., the coexistence of categorical and numerical data. Feature selection, classification and prediction towards hybrid data thus hold great significance. Generally speaking, there are two strategies in hybrid data processing. One strategy is employing classical numerical data processing methods, including PCA [24], neural networks [6], [14] and SVM [37]. When using these methods, all categorical data should be coded as integral numbers in hybrid data. However, processing categorical data in this manner is unreasonable, as the coded values of categorical data lack practical meanings [11]. Classical categorical data processing methods use the other strategy, including rough set theory [1], [18], [20], [21], [22], [25], [28], [30], [31], [32], [36], [39], [47]. Problems occur when numerical data are processed using traditional rough set theory. Discretizing numerical data into categorical data is thus necessary; however, this leads to the incurrence of information loss in the discretization process [11], [46]. Both strategies mentioned above have their own limits.

Researchers have recently proposed several hybrid data processing methods [2], [7], [11], [12], [15], [26], [29], [34], [35], [38], [40], frequently using fuzzy and neighborhood rough set models. Fuzzy sets and rough sets are complementary in handling uncertainty [3], [4], [8], [10], [13], [23], [27], [43]. Dubois and Prade [7] combined rough and fuzzy set theory to define the first fuzzy rough sets. This model employed the min and max fuzzy operators to describe the fuzzy lower and upper approximations. Radzikowska and Kerre [33] defined fuzzy rough sets in a more general manner based on the T-equivalence relation. The fuzzy lower and upper approximations were constructed by an implicator and triangular norm. Mi and Zhang [25] presented a new fuzzy rough set definition based on a residual implication θ and its dual σ. Hu et al. [11] introduced a novel fuzzy rough model, presented several attribute significance measures and designed a forward greedy algorithm for hybrid attribute reduction. Wang et al. [38] defined new lower and upper approximations based on the similarity between two objects and extended some underlying concepts to the fuzzy environment. Yeung et al. [46] first defined some lower and upper approximations based on arbitrary fuzzy relations from the constructive approach viewpoint. Some of the fuzzy rough set models mentioned above usually process hybrid data [7], [11], [35], [38]. Furthermore, hybrid data analysis also employed another traditional rough set generalization: the neighborhood rough set [12], [16], [41], [42], [44], [45]. Neighborhoods and neighborhood relations are important concepts in topology. Lin [19] regarded neighborhood spaces as general topological spaces more than equivalence spaces and introduced neighborhood relations into rough set methodology. The notion of neighborhood systems provided a convenient and flexible tool for representing similarity and described a hybrid information system with categorical and numerical attributes. Wu and Zhang [41] explicitly discussed the properties of neighborhood approximation spaces. Yao [43], [45] relaxed the original query with a neighborhood system to conduct approximation retrieval. Hu et al. [12] constructed a unified theoretical framework for a neighborhood-based classifier using a neighborhood-based rough set model and a forward feature set selection algorithm towards hybrid data.

Some fuzzy and neighborhood rough set models mentioned above have been used to process hybrid data. However, a user cannot know which rough set model is appropriate when analyzing a given data set, making it difficult to select the appropriate model for a specific case. Solving this problem requires exploring the inherent relationships among the existing models, which helps researchers identify these generalized rough sets and select a proper model for a given application. This paper illustrates these relationships from two perspectives: constructing information granules and their rough approximations. It first discusses the analysis of the relationship between constructing fuzzy and neighborhood hybrid granules, in which information granules are the basis for rough approximations in rough set models. The paper then explores relationships among these rough approximations in the existing rough set models. This research clarifies the inherent relationships among these existing models.

The rest of the paper is organized as follows. Section 2 reviews some preliminary concepts. Section 3 analyzes the relationship between fuzzy hybrid granules and neighborhood hybrid granules. Section 4 introduces five rough set models for hybrid data. Section 5 investigates the relationships among the models, and the last section concludes the paper.

Section snippets

Preliminaries

Several fuzzy rough set models and the neighborhood rough set model are capable of processing hybrid data. To clarify the relationships among them, this section reviews some basic concepts, which facilitates the understanding of the remainder of this paper.

Comparison of hybrid information granules

In this section, hybrid information granules are divided into two types: crisp and fuzzy hybrid granules. The following subsections explicitly investigate the relationship between them.

Rough approximations for hybrid data

Defining rough approximations (lower and upper approximations) is a key problem for a rough set model. In this section, we review several common rough approximations for hybrid data.

Comparing rough approximations for hybrid data

Hu’s fuzzy, neighborhood and Wang’s fuzzy rough approximations for hybrid data are all crisp object sets, whereas Dubois’ and Radzikowska and Kerre’s fuzzy rough approximations for hybrid data are fuzzy object sets. These rough approximations are divided into two types: crisp and fuzzy hybrid rough approximations. This section investigates the relationships among them.

Conclusions

This paper clarifies the relationships among the generalized rough set models for hybrid data. To approach the target, we investigated the relationships among the rough sets from two viewpoints: constructing information granules and rough approximations. We first investigated in detail the construction of fuzzy and neighborhood hybrid granules. We then analyzed the relationships among these rough approximations. We came to the following conclusions: Hu’s fuzzy rough approximations are special

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 71031006, 70971080, 60903110), Special prophase project for the National Key Basic Research and Development Program of China (973) (No. 2011CB311805), the Foundation of Doctoral Program Research of the Ministry of Education of China (20101401110002) and the Natural Science Foundation of Shanxi Province (Nos. 2009021017-1, 2010021017-3).

References (47)

  • R. Jensen et al.

    Fuzzy-rough nearest neighbour classification and prediction

    Theoretical Computer Science

    (2011)
  • J.Y. Liang et al.

    A new measure of uncertainty based on knowledge granulation for rough sets

    Information Sciences

    (2009)
  • G.L. Liu et al.

    Invertible approximation operators of generalized rough sets and fuzzy rough sets

    Information Sciences

    (2010)
  • J.S. Mi et al.

    An axiomatic characterization of a fuzzy generalization of rough sets

    Information Sciences

    (2004)
  • N.N. Morsi et al.

    Axiomatics for fuzzy rough sets

    Fuzzy Sets and Systems

    (1998)
  • Y. Ouyang et al.

    On fuzzy rough sets based on tolerance relations

    Information Sciences

    (2010)
  • W. Pedrycz et al.

    Feature analysis through information granulation and fuzzy sets

    Pattern Recognition

    (2002)
  • Y.H. Qian et al.

    Set-valued ordered information systems

    Information Sciences

    (2009)
  • Y.H. Qian et al.

    Positive approximation: an accelerator for attribute reduction in rough set theory

    Artificial Intelligence

    (2010)
  • Y.H. Qian et al.

    MGRS: a multi-granulation rough set

    Information Sciences

    (2010)
  • A.M. Radzikowska et al.

    A comparative study on fuzzy-rough sets

    Fuzzy Sets and Systems

    (2002)
  • Q. Shen et al.

    Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring

    Pattern Recognition

    (2004)
  • X.Z. Wang et al.

    Learning fuzzy rules from fuzzy samples based on rough set technique

    Information Sciences

    (2007)
  • Cited by (65)

    • Maps between covering approximation spaces and the product space of two covering approximation spaces

      2019, International Journal of Approximate Reasoning
      Citation Excerpt :

      Pawlak [14] has formulated rough set theory to process insufficient information on the basis of equivalence classes and lower and upper approximations have been used to characterize an arbitrary set of a universe. However, the use of equivalence relations has led to many limitations [28,7]. To overcome these limitations, various rough set models have been proposed for practical applications by replacing equivalence relations with concepts such as Boolean algebras [32,29], binary relations [38,15], and similarity relations [22], etc.

    View all citing articles on Scopus
    View full text