Skip to main content

2002 | Buch

Data Mining and Knowledge Discovery with Evolutionary Algorithms

verfasst von: Dr. Alex A. Freitas

Verlag: Springer Berlin Heidelberg

Buchreihe : Natural Computing Series

insite
SUCHEN

Über dieses Buch

This book addresses the integration of two areas of computer science, namely data mining and evolutionary algorithms. Both these areas have become increas­ ingly popular in the last few years, and their integration is currently an area of active research. In essence, data mining consists of extracting valid, comprehensible, and in­ teresting knowledge from data. Data mining is actually an interdisciplinary field, since there are many kinds of methods that can be used to extract knowledge from data. Arguably, data mining mainly uses methods from machine learning (a branch of artificial intelligence) and statistics (including statistical pattern recog­ nition). Our discussion of data mining and evolutionary algorithms is primarily based on machine learning concepts and principles. In particular, in this book we emphasize the importance of discovering comprehensible, interesting knowledge, which the user can potentially use to make intelligent decisions. In a nutshell, the motivation for applying evolutionary algorithms to data mining is that evolutionary algorithms are robust search methods which perform a global search in the space of candidate solutions (rules or another form of knowl­ edge representation). In contrast, most rule induction methods perform a local, greedy search in the space of candidate rules. Intuitively, the global search of evolutionary algorithms can discover interesting rules and patterns that would be missed by the greedy search.

Inhaltsverzeichnis

Frontmatter
1. Introduction
Abstract
Nowadays there is a huge amount of data stored in real-world databases, and this amount continues to grow fast. As pointed out by [Piatetsky-Shapiro 1991], this creates both an opportunity and a need for (semi-)automatic methods that discover the knowledge “hidden” in such databases. If such knowledge discovery activity is successful, discovered knowledge can be used to improve the decision-making process of an organization.
Alex A. Freitas
2. Data Mining Tasks and Concepts
Abstract
There are several data mining tasks. Each task can be considered as a kind of problem to be solved by a data mining algorithm. Therefore, each task has its own requirements, and the kind of knowledge discovered by solving one task is usually very different — and it is often used for very different purposes — from the kind of knowledge discovered by solving another task. Therefore, the first step in the development of a data mining algorithm is to define which data mining task the algorithm will address.
Alex A. Freitas
3. Data Mining Paradigms
Abstract
As mentioned in the Introduction, since data mining is a very interdisciplinary field, there are many different paradigms of data mining algorithms, such as decision-tree building, rule induction, instance-based learning (or nearest neighbor), neural networks, statistical algorithms, evolutionary algorithms, etc. [Dhar and Stein 1997; Mitchell 1997; Langley 1996; Michie et al. 1994].
Alex A. Freitas
4. Data Preparation
Abstract
No matter how “intelligent” a data mining algorithm is, it will fail to discover high-quality knowledge if it is applied to low-quality data. In this chapter we focus on data preparation methods for data mining. The general goal is to improve the quality of the data being mined, to facilitate the application of a data mining algorithm. Hence, the methods discussed in this chapter can be regarded as a form of preprocessing for a data mining algorithm.
Alex A. Freitas
5. Basic Concepts of Evolutionary Algorithms
Abstract
This chapter discusses some basic concepts and principles of Evolutionary Algorithms (EAs), focusing mainly on Genetic Algorithms (GAs) and Genetic Programming (GP). The main goal of this chapter is to help the reader who is not familiar with these kinds of algorithm to better understand the next chapters of this book.
Alex A. Freitas
6. Genetic Algorithms for Rule Discovery
Abstract
In this chapter we discuss several issues related to developing genetic algorithms (GAs) for prediction-rule discovery. The development of a GA for rule discovery involves a number of nontrivial design decisions. In this chapter we categorize these decisions into five groups, each of them discussed in a separate section, as follows.
Alex A. Freitas
7. Genetic Programming for Rule Discovery
Abstract
In subsection 5.4.4 we saw that standard Genetic Programming (GP) for symbolic regression — where all terminals are real-valued variables or constants and all functions have real-valued inputs and output — can be used for classification, if the numeric value output at the root of the tree is properly interpreted. However, this kind of GP does not produce high-level, comprehensible IF-THEN rules in the style of the rules discovered by rule induction and decision-tree building algorithms (chapter 3) and GAs for rule discovery (chapter 6). As discussed in subsection 1.1.1, rule comprehensibility is important whenever the discovered knowledge is used for decision making by a human user.
Alex A. Freitas
8. Evolutionary Algorithms for Clustering
Abstract
In section 2.4 we reviewed the basic ideas of two major types of clustering meth­ods, namely iterative-partitioning and hierarchical methods. In this chapter we discuss several issues in the development of Evolutionary Algorithms (EAs) for clustering following the iterative-partitioning approach, which seems the most common approach in the EA literature. (In passing, we mention that a couple of recent projects on GAs for clustering following the hierarchical approach can be found in [Rizzi 1998; Lozano and Larranaga 1999].)
Alex A. Freitas
9. Evolutionary Algorithms for Data Preparation
Abstract
Clearly the quality of discovered knowledge strongly depends on the quality of the data being mined. This has motivated the development of several algorithms for data preparation tasks, as discussed in chapter 4.
Alex A. Freitas
10. Evolutionary Algorithms for Discovering Fuzzy Rules
Abstract
This chapter discusses several concepts and issues in the development of Evolutionary Algorithms (EAs) for discovering fuzzy prediction rules. We start with a review of basic concepts of fuzzy sets, in section 10.1, and with a discussion on the difference between fuzzy prediction rules and crisp prediction rules, in section 10.2.
Alex A. Freitas
11. Scaling up Evolutionary Algorithms for Large Data Sets
Abstract
One well-known disadvantage of evolutionary algorithms (EAs) for rule discovery is that in general they are slow, by comparison with rule discovery algorithms based on the rule induction paradigm. After all, rule induction algorithms usually perform a kind of local search in the rule space, whereas EAs are population-based algorithms that perform a more global search of the rule space.
Alex A. Freitas
12. Conclusions and Research Directions
Abstract
This chapter is divided into two parts. Section 12.1 presents some general remarks on data mining with evolutionary algorithms (EAs). These remarks can be regarded as a very compact summary of the main arguments of this book. They discuss, in general, the suitability of EAs for data mining with respect to the issues of predictive accuracy and comprehensibility of discovered knowledge, as well as the issue of the computational time taken by EAs.
Alex A. Freitas
Backmatter
Metadaten
Titel
Data Mining and Knowledge Discovery with Evolutionary Algorithms
verfasst von
Dr. Alex A. Freitas
Copyright-Jahr
2002
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-662-04923-5
Print ISBN
978-3-642-07763-0
DOI
https://doi.org/10.1007/978-3-662-04923-5