nach oben

2002 | Buch

Kapitel lesen Erstes Kapitel lesen

Data Mining and Knowledge Discovery with Evolutionary Algorithms

verfasst von: Dr. Alex A. Freitas

Verlag: Springer Berlin Heidelberg

Buchreihe : Natural Computing Series

Enthalten in: Professional Book Archive

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book addresses the integration of two areas of computer science, namely data mining and evolutionary algorithms. Both these areas have become increas ingly popular in the last few years, and their integration is currently an area of active research. In essence, data mining consists of extracting valid, comprehensible, and in teresting knowledge from data. Data mining is actually an interdisciplinary field, since there are many kinds of methods that can be used to extract knowledge from data. Arguably, data mining mainly uses methods from machine learning (a branch of artificial intelligence) and statistics (including statistical pattern recog nition). Our discussion of data mining and evolutionary algorithms is primarily based on machine learning concepts and principles. In particular, in this book we emphasize the importance of discovering comprehensible, interesting knowledge, which the user can potentially use to make intelligent decisions. In a nutshell, the motivation for applying evolutionary algorithms to data mining is that evolutionary algorithms are robust search methods which perform a global search in the space of candidate solutions (rules or another form of knowl edge representation). In contrast, most rule induction methods perform a local, greedy search in the space of candidate rules. Intuitively, the global search of evolutionary algorithms can discover interesting rules and patterns that would be missed by the greedy search.

Inhaltsverzeichnis

Frontmatter

1. Introduction

Abstract

Nowadays there is a huge amount of data stored in real-world databases, and this amount continues to grow fast. As pointed out by [Piatetsky-Shapiro 1991], this creates both an opportunity and a need for (semi-)automatic methods that discover the knowledge “hidden” in such databases. If such knowledge discovery activity is successful, discovered knowledge can be used to improve the decision-making process of an organization.

Alex A. Freitas

2. Data Mining Tasks and Concepts

Abstract

There are several data mining tasks. Each task can be considered as a kind of problem to be solved by a data mining algorithm. Therefore, each task has its own requirements, and the kind of knowledge discovered by solving one task is usually very different — and it is often used for very different purposes — from the kind of knowledge discovered by solving another task. Therefore, the first step in the development of a data mining algorithm is to define which data mining task the algorithm will address.

Alex A. Freitas

3. Data Mining Paradigms

Abstract

As mentioned in the Introduction, since data mining is a very interdisciplinary field, there are many different paradigms of data mining algorithms, such as decision-tree building, rule induction, instance-based learning (or nearest neighbor), neural networks, statistical algorithms, evolutionary algorithms, etc. [Dhar and Stein 1997; Mitchell 1997; Langley 1996; Michie et al. 1994].

Alex A. Freitas

4. Data Preparation

Abstract

No matter how “intelligent” a data mining algorithm is, it will fail to discover high-quality knowledge if it is applied to low-quality data. In this chapter we focus on data preparation methods for data mining. The general goal is to improve the quality of the data being mined, to facilitate the application of a data mining algorithm. Hence, the methods discussed in this chapter can be regarded as a form of preprocessing for a data mining algorithm.

Alex A. Freitas

5. Basic Concepts of Evolutionary Algorithms

Abstract

This chapter discusses some basic concepts and principles of Evolutionary Algorithms (EAs), focusing mainly on Genetic Algorithms (GAs) and Genetic Programming (GP). The main goal of this chapter is to help the reader who is not familiar with these kinds of algorithm to better understand the next chapters of this book.

Alex A. Freitas

6. Genetic Algorithms for Rule Discovery

Abstract

In this chapter we discuss several issues related to developing genetic algorithms (GAs) for prediction-rule discovery. The development of a GA for rule discovery involves a number of nontrivial design decisions. In this chapter we categorize these decisions into five groups, each of them discussed in a separate section, as follows.

Alex A. Freitas

7. Genetic Programming for Rule Discovery

Abstract

In subsection 5.4.4 we saw that standard Genetic Programming (GP) for symbolic regression — where all terminals are real-valued variables or constants and all functions have real-valued inputs and output — can be used for classification, if the numeric value output at the root of the tree is properly interpreted. However, this kind of GP does not produce high-level, comprehensible IF-THEN rules in the style of the rules discovered by rule induction and decision-tree building algorithms (chapter 3) and GAs for rule discovery (chapter 6). As discussed in subsection 1.1.1, rule comprehensibility is important whenever the discovered knowledge is used for decision making by a human user.

Alex A. Freitas

8. Evolutionary Algorithms for Clustering

Abstract

In section 2.4 we reviewed the basic ideas of two major types of clustering methods, namely iterative-partitioning and hierarchical methods. In this chapter we discuss several issues in the development of Evolutionary Algorithms (EAs) for clustering following the iterative-partitioning approach, which seems the most common approach in the EA literature. (In passing, we mention that a couple of recent projects on GAs for clustering following the hierarchical approach can be found in [Rizzi 1998; Lozano and Larranaga 1999].)

Alex A. Freitas

9. Evolutionary Algorithms for Data Preparation

Abstract

Clearly the quality of discovered knowledge strongly depends on the quality of the data being mined. This has motivated the development of several algorithms for data preparation tasks, as discussed in chapter 4.

Alex A. Freitas

10. Evolutionary Algorithms for Discovering Fuzzy Rules

Abstract

This chapter discusses several concepts and issues in the development of Evolutionary Algorithms (EAs) for discovering fuzzy prediction rules. We start with a review of basic concepts of fuzzy sets, in section 10.1, and with a discussion on the difference between fuzzy prediction rules and crisp prediction rules, in section 10.2.

Alex A. Freitas

11. Scaling up Evolutionary Algorithms for Large Data Sets

Abstract

One well-known disadvantage of evolutionary algorithms (EAs) for rule discovery is that in general they are slow, by comparison with rule discovery algorithms based on the rule induction paradigm. After all, rule induction algorithms usually perform a kind of local search in the rule space, whereas EAs are population-based algorithms that perform a more global search of the rule space.

Alex A. Freitas

12. Conclusions and Research Directions

Abstract

This chapter is divided into two parts. Section 12.1 presents some general remarks on data mining with evolutionary algorithms (EAs). These remarks can be regarded as a very compact summary of the main arguments of this book. They discuss, in general, the suitability of EAs for data mining with respect to the issues of predictive accuracy and comprehensibility of discovered knowledge, as well as the issue of the computational time taken by EAs.

Alex A. Freitas

Backmatter

Titel: Data Mining and Knowledge Discovery with Evolutionary Algorithms
verfasst von: Dr. Alex A. Freitas
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-662-04923-5
Print ISBN: 978-3-642-07763-0
DOI: https://doi.org/10.1007/978-3-662-04923-5

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

1. Introduction

2. Data Mining Tasks and Concepts

3. Data Mining Paradigms

4. Data Preparation

5. Basic Concepts of Evolutionary Algorithms

6. Genetic Algorithms for Rule Discovery

7. Genetic Programming for Rule Discovery

8. Evolutionary Algorithms for Clustering

9. Evolutionary Algorithms for Data Preparation

10. Evolutionary Algorithms for Discovering Fuzzy Rules

11. Scaling up Evolutionary Algorithms for Large Data Sets

12. Conclusions and Research Directions

Backmatter