A comparative study of rough sets for hybrid data
Introduction
In real world databases, data sets usually take on hybrid forms, i.e., the coexistence of categorical and numerical data. Feature selection, classification and prediction towards hybrid data thus hold great significance. Generally speaking, there are two strategies in hybrid data processing. One strategy is employing classical numerical data processing methods, including PCA [24], neural networks [6], [14] and SVM [37]. When using these methods, all categorical data should be coded as integral numbers in hybrid data. However, processing categorical data in this manner is unreasonable, as the coded values of categorical data lack practical meanings [11]. Classical categorical data processing methods use the other strategy, including rough set theory [1], [18], [20], [21], [22], [25], [28], [30], [31], [32], [36], [39], [47]. Problems occur when numerical data are processed using traditional rough set theory. Discretizing numerical data into categorical data is thus necessary; however, this leads to the incurrence of information loss in the discretization process [11], [46]. Both strategies mentioned above have their own limits.
Researchers have recently proposed several hybrid data processing methods [2], [7], [11], [12], [15], [26], [29], [34], [35], [38], [40], frequently using fuzzy and neighborhood rough set models. Fuzzy sets and rough sets are complementary in handling uncertainty [3], [4], [8], [10], [13], [23], [27], [43]. Dubois and Prade [7] combined rough and fuzzy set theory to define the first fuzzy rough sets. This model employed the min and max fuzzy operators to describe the fuzzy lower and upper approximations. Radzikowska and Kerre [33] defined fuzzy rough sets in a more general manner based on the T-equivalence relation. The fuzzy lower and upper approximations were constructed by an implicator and triangular norm. Mi and Zhang [25] presented a new fuzzy rough set definition based on a residual implication θ and its dual σ. Hu et al. [11] introduced a novel fuzzy rough model, presented several attribute significance measures and designed a forward greedy algorithm for hybrid attribute reduction. Wang et al. [38] defined new lower and upper approximations based on the similarity between two objects and extended some underlying concepts to the fuzzy environment. Yeung et al. [46] first defined some lower and upper approximations based on arbitrary fuzzy relations from the constructive approach viewpoint. Some of the fuzzy rough set models mentioned above usually process hybrid data [7], [11], [35], [38]. Furthermore, hybrid data analysis also employed another traditional rough set generalization: the neighborhood rough set [12], [16], [41], [42], [44], [45]. Neighborhoods and neighborhood relations are important concepts in topology. Lin [19] regarded neighborhood spaces as general topological spaces more than equivalence spaces and introduced neighborhood relations into rough set methodology. The notion of neighborhood systems provided a convenient and flexible tool for representing similarity and described a hybrid information system with categorical and numerical attributes. Wu and Zhang [41] explicitly discussed the properties of neighborhood approximation spaces. Yao [43], [45] relaxed the original query with a neighborhood system to conduct approximation retrieval. Hu et al. [12] constructed a unified theoretical framework for a neighborhood-based classifier using a neighborhood-based rough set model and a forward feature set selection algorithm towards hybrid data.
Some fuzzy and neighborhood rough set models mentioned above have been used to process hybrid data. However, a user cannot know which rough set model is appropriate when analyzing a given data set, making it difficult to select the appropriate model for a specific case. Solving this problem requires exploring the inherent relationships among the existing models, which helps researchers identify these generalized rough sets and select a proper model for a given application. This paper illustrates these relationships from two perspectives: constructing information granules and their rough approximations. It first discusses the analysis of the relationship between constructing fuzzy and neighborhood hybrid granules, in which information granules are the basis for rough approximations in rough set models. The paper then explores relationships among these rough approximations in the existing rough set models. This research clarifies the inherent relationships among these existing models.
The rest of the paper is organized as follows. Section 2 reviews some preliminary concepts. Section 3 analyzes the relationship between fuzzy hybrid granules and neighborhood hybrid granules. Section 4 introduces five rough set models for hybrid data. Section 5 investigates the relationships among the models, and the last section concludes the paper.
Section snippets
Preliminaries
Several fuzzy rough set models and the neighborhood rough set model are capable of processing hybrid data. To clarify the relationships among them, this section reviews some basic concepts, which facilitates the understanding of the remainder of this paper.
Comparison of hybrid information granules
In this section, hybrid information granules are divided into two types: crisp and fuzzy hybrid granules. The following subsections explicitly investigate the relationship between them.
Rough approximations for hybrid data
Defining rough approximations (lower and upper approximations) is a key problem for a rough set model. In this section, we review several common rough approximations for hybrid data.
Comparing rough approximations for hybrid data
Hu’s fuzzy, neighborhood and Wang’s fuzzy rough approximations for hybrid data are all crisp object sets, whereas Dubois’ and Radzikowska and Kerre’s fuzzy rough approximations for hybrid data are fuzzy object sets. These rough approximations are divided into two types: crisp and fuzzy hybrid rough approximations. This section investigates the relationships among them.
Conclusions
This paper clarifies the relationships among the generalized rough set models for hybrid data. To approach the target, we investigated the relationships among the rough sets from two viewpoints: constructing information granules and rough approximations. We first investigated in detail the construction of fuzzy and neighborhood hybrid granules. We then analyzed the relationships among these rough approximations. We came to the following conclusions: Hu’s fuzzy rough approximations are special
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Nos. 71031006, 70971080, 60903110), Special prophase project for the National Key Basic Research and Development Program of China (973) (No. 2011CB311805), the Foundation of Doctoral Program Research of the Ministry of Education of China (20101401110002) and the Natural Science Foundation of Shanxi Province (Nos. 2009021017-1, 2010021017-3).
References (47)
- et al.
On fuzzy-rough sets approach to feature selection
Pattern Recognition Letters
(2005) - et al.
Parameterized attribute reduction with Gaussian kernel based fuzzy rough sets
Information Sciences
(2011) - et al.
Attribute selection with fuzzy decision reducts
Information Sciences
(2010) - et al.
Neuro-fuzzy feature evaluation with theoretical analysis
Neural Networks
(1999) - et al.
Putting fuzzy sets and rough sets together
- et al.
Twofold fuzzy sets and rough sets-some issues in knowledge representation
Fuzzy Sets and Systems
(1987) - et al.
Information-preserving hybrid data reduction based on fuzzy-rough techniques
Pattern Recognition Letters
(2006) - et al.
Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation
Pattern Recognition
(2007) - et al.
Neighborhood rough set based heterogeneous feature subset selection
Information Sciences
(2008) - et al.
Soft fuzzy rough sets for robust feature evaluation and selection
Information Sciences
(2010)
Fuzzy-rough nearest neighbour classification and prediction
Theoretical Computer Science
A new measure of uncertainty based on knowledge granulation for rough sets
Information Sciences
Invertible approximation operators of generalized rough sets and fuzzy rough sets
Information Sciences
An axiomatic characterization of a fuzzy generalization of rough sets
Information Sciences
Axiomatics for fuzzy rough sets
Fuzzy Sets and Systems
On fuzzy rough sets based on tolerance relations
Information Sciences
Feature analysis through information granulation and fuzzy sets
Pattern Recognition
Set-valued ordered information systems
Information Sciences
Positive approximation: an accelerator for attribute reduction in rough set theory
Artificial Intelligence
MGRS: a multi-granulation rough set
Information Sciences
A comparative study on fuzzy-rough sets
Fuzzy Sets and Systems
Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring
Pattern Recognition
Learning fuzzy rules from fuzzy samples based on rough set technique
Information Sciences
Cited by (65)
Novel classes of fuzzy β-covering-based rough set over two distinct universes
2023, Fuzzy Sets and SystemsOn three types of L-fuzzy β-covering-based rough sets
2023, Fuzzy Sets and SystemsOutlier detection based on weighted neighbourhood information network for mixed-valued datasets
2021, Information SciencesIncremental feature selection for dynamic hybrid data using neighborhood rough set
2020, Knowledge-Based SystemsMaps between covering approximation spaces and the product space of two covering approximation spaces
2019, International Journal of Approximate ReasoningCitation Excerpt :Pawlak [14] has formulated rough set theory to process insufficient information on the basis of equivalence classes and lower and upper approximations have been used to characterize an arbitrary set of a universe. However, the use of equivalence relations has led to many limitations [28,7]. To overcome these limitations, various rough set models have been proposed for practical applications by replacing equivalence relations with concepts such as Boolean algebras [32,29], binary relations [38,15], and similarity relations [22], etc.