nach oben

2008 | Buch

Kapitel lesen Erstes Kapitel lesen

Rough – Granular Computing in Knowledge Discovery and Data Mining

verfasst von: Jarosław Stepaniuk

Verlag: Springer Berlin Heidelberg

Buchreihe : Studies in Computational Intelligence

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Inhaltsverzeichnis

Frontmatter

Introduction

Abstract

The amount of electronic data available is growing very fast and this explosive growth in databases has generated a need for new techniques and tools that can intelligently and automatically extract implicit, previously unknown, hidden and potentially useful information and knowledge from these data. These tools and techniques are the subject of the fields of knowledge discovery in databases and data mining.

Jarosław Stepaniuk

Part I: Rough Set Methodology

Frontmatter

Rough Sets

Abstract

Rough set theory due to Zdzisław Pawlak (1926-2006) [106, 108, 109, 110], is a mathematical approach to imperfect knowledge. The problem of imperfect knowledge has been tackled for a long time by philosophers, logicians and mathematicians. Recently it has become a crucial issue for computer scientists as well, particularly in the area of computational intelligence [129], [99]. There are many approaches to the problem of how to understand and manipulate imperfect knowledge. The most successful one is, no doubt, the fuzzy set theory proposed by Lotfi A. Zadeh [226]. Rough set theory presents still another attempt to solve this problem. It is based on an assumption that objects are perceived by partial information about them. Due to this some objects can be indiscernible. Indiscernible objects form elementary granules. From this fact it follows that some sets can not be exactly described by available information about objects. They are rough not crisp. Any rough set is characterized by its (lower and upper) approximations.

Jarosław Stepaniuk

Data Reduction

Introduction

Nowadays, we deal with large data tables that include up to billions of objects and up to several thousands of attributes. We often face a question whether we can remove some data from a data table preserving its basic properties, that is – whether a table contains some superfluous data. This chapter provides an introduction to rough set based data preprocessing methods, which are concerned with selection of attributes to reduce the dimensionality and improve the data for subsequent data mining analysis.

Jarosław Stepaniuk

Part II: Classification and Clustering

Frontmatter

Selected Classification Methods

Abstract

Any classification algorithm should consists of some classifiers together with a method of conflicts resolving between the classifiers when new objects are classified. In this chapter we discuss two classes of classification algorithms. Algorithms from the first class are using sets of decision rules as classifiers together with some methods of conflict resolving. The rules are generated from decision tables with tolerance relations using Boolean reasoning approach. They create decision classes descriptions. However, to predict (or classify) new object to a proper decision class it is necessary to fix some methods for conflict resolving between rules recognizing the object and voting for different decisions. We also discuss how such decision rules can be generated using Boolean reasoning. Algorithms of the second kind are based on the nearest neighbor method (k − NN) (see, top ten data mining algorithms [217]). We show how this method can be combined with some rough set method for relevant attribute selection.

Jarosław Stepaniuk

Selected Clustering Methods

Abstract

In this chapter we recall the concept of clustering and discuss in detail some selected algorithms.

The organization of this chapter is as follows. In Section 5.1 selected clustering algorithms are discussed. In Section 5.2 the self-organizing system for information granulation is recalled. In Section 5.3 we discuss some clustering algorithms received by combining clustering with methods of the rough set theory. In Section 5.4 quality of information granulation is discussed.

Jarosław Stepaniuk

A Medical Case Study

Abstract

Many interesting applications of rough set methods are reported. Let us mention only some of medical applications: risk pattern identification in the treatment of infants with respiratory failure [6], treatment of duodenal ulcer by HSV [111], analysis of data from peritoneal lavage in acute pancreatitis [165], knowledge acquisition in nursing [45], medical databases (e.g. headache, meningitis, CVD) analysis ([204, 205]), image analysis for medical applications ([61, 94]), surgical wound infection [66], preterm birth prediction [45], medical decision-making on board space station Freedom (NASA Johnson Space Center) [45], diagnosing in progressive encephalopathy [209], data preparation for data mining in medical data sets [56], selection of important attributes for medical diagnosis systems [57], visualization of rough set decision rules for medical diagnosis systems [58], automatic detection of speech disorders [24], rough set-based filtration of sound applicable to hearing prostheses [23], discovery of attribute dependences in diabetes data [187].

Jarosław Stepaniuk

Part III: Complex Data and Complex Concepts

Frontmatter

Mining Knowledge from Complex Data

Introduction

One important type of complex knowledge can occur when mining data from multiple relations. In most domains, the objects of interest are not independent of each other, and are not of a single type. For example in World Wide Web

Text has a list structure. We consider sequences of words.
HTML has a tree structure (nested tags).
Hyperlinks have a graph structure (linked pages).

In fact, most real domains have combinations of different types of internal and external structure nested at multiple levels of abstraction. We need data mining systems that can soundly mine the rich structure of relations among objects, such as interlinked Web pages, social networks, metabolic networks in the cell, etc. Yet another important problem is how to mine non-relational data. For example described by formulas of first-order logic.

Jarosław Stepaniuk

Complex Concept Approximations

Abstract

The rough set approach was further developed to deal with more compound granules than elementary granules. In this chapter, we present a methodology for modeling of compound granules using the rough set approach. The methodology is based on operations on information systems. There are two basic steps in such a modeling. In the first step, new granules are constructed from objects represented by granules in some already constructed information systems. These new granules are used as objects in the new constructed information systems. In the second step the features of the new granules are added. This approach can be used for modeling, e.g., compound granules in spatio-temporal reasoning.

Jarosław Stepaniuk

Part IV: Conclusions, Bibliography and Further Readings

Frontmatter

Concluding Remarks

Abstract

In this book we have outlined a methodology for knowledge discovery and data mining by means of rough–granular computing. Several research directions are related to rough–granular computing. We enclose a list of such directions together with examples of problems.

Jarosław Stepaniuk

Backmatter

Titel: Rough – Granular Computing in Knowledge Discovery and Data Mining
verfasst von: Jarosław Stepaniuk
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-70801-8
Print ISBN: 978-3-540-70800-1
DOI: https://doi.org/10.1007/978-3-540-70801-8

Springer Professional

Inhaltsverzeichnis

Frontmatter

Introduction

Introduction

Part I: Rough Set Methodology

Frontmatter

Rough Sets

Data Reduction

Part II: Classification and Clustering

Frontmatter

Selected Classification Methods

Selected Clustering Methods

A Medical Case Study

Part III: Complex Data and Complex Concepts

Frontmatter

Mining Knowledge from Complex Data

Complex Concept Approximations

Part IV: Conclusions, Bibliography and Further Readings

Frontmatter

Concluding Remarks

Backmatter

Premium Partner