Skip to main content

2015 | Buch

Automatic Design of Decision-Tree Induction Algorithms

insite
SUCHEN

Über dieses Buch

Presents a detailed study of the major design components that constitute a top-down decision-tree induction algorithm, including aspects such as split criteria, stopping criteria, pruning and the approaches for dealing with missing values. Whereas the strategy still employed nowadays is to use a 'generic' decision-tree induction algorithm regardless of the data, the authors argue on the benefits that a bias-fitting strategy could bring to decision-tree induction, in which the ultimate goal is the automatic generation of a decision-tree induction algorithm tailored to the application domain of interest. For such, they discuss how one can effectively discover the most suitable set of components of decision-tree induction algorithms to deal with a wide variety of applications through the paradigm of evolutionary computation, following the emergence of a novel field called hyper-heuristics.

"Automatic Design of Decision-Tree Induction Algorithms" would be highly useful for machine learning and evolutionary computation students and researchers alike.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
Classification, which is the data mining task of assigning objects to predefined categories, is widely used in the process of intelligent decision making.
Rodrigo C. Barros, André C. P. L. F. de Carvalho, Alex A. Freitas
Chapter 2. Decision-Tree Induction
Abstract
Decision-tree induction algorithms are highly used in a variety of domains for knowledge discovery and pattern recognition. They have the advantage of producing a comprehensible classification/regression model and satisfactory accuracy levels in several application domains, such as medical diagnosis and credit risk assessment. In this chapter, we present in detail the most common approach for decision-tree induction: top-down induction (Sect. 2.3). Furthermore, we briefly comment on some alternative strategies for induction of decision trees (Sect. 2.4). Our goal is to summarize the main design options one has to face when building decision-tree induction algorithms. These design choices will be specially interesting when designing an evolutionary algorithm for evolving decision-tree induction algorithms.
Rodrigo C. Barros, André C. P. L. F. de Carvalho, Alex A. Freitas
Chapter 3. Evolutionary Algorithms and Hyper-Heuristics
Abstract
This chapter presents the basic concepts of evolutionary algorithms (EAs) and hyper-heuristics (HHs), which are computational techniques directly explored in this book. EAs are well-known population-based metaheuristics. They have been employed in artificial intelligence over several years with the goal of providing the near-optimal solution for a problem that comprises a very large search space. A general overview of EAs is presented in Sect. 3.1. HHs, in turn, are a recently new field in the optimisation research area, in which a metaheuristic—often an EA, and this is why these related concepts are reviewed together in this chapter—is used for searching in the space of heuristics (algorithms), and not in the space of solutions, like conventional metaheuristics. The near-optimal heuristic (algorithm) provided by a HHs approach can be further employed in several distinct problems, instead of relying on a new search process for each new problem to be solved. An overview of HHs is given in Sect. 3.2.
Rodrigo C. Barros, André C. P. L. F. de Carvalho, Alex A. Freitas
Chapter 4. HEAD-DT: Automatic Design of Decision-Tree Algorithms
Abstract
As presented in Chap. 2, for the past 40 years researchers have attempted to improve decision-tree induction algorithms, either by proposing new splitting criteria for internal nodes, by investigating pruning strategies for avoiding overfitting, by testing new approaches for dealing with missing values, or even by searching for alternatives to the top-down greedy induction. Each new decision-tree induction algorithm presents some (or many) of these strategies, which are chosen in order to maximize performance in empirical analyses. Nevertheless, the number of different strategies for the several components of a decision-tree algorithm is so vast after these 40 years of research that it would be impracticable for a human being to test all possibilities with the purpose of achieving the best performance in a given data set (or in a set of data sets). Hence, we pose two questions for researchers in the area: “is it possible to automate the design of decision-tree induction algorithms?”, and, if so, “how can we automate the design of a decision-tree induction algorithm?” The answer for these questions arose with the pioneering work of Pappa and Freitas [30], which proposed the automatic design of rule induction algorithms through an evolutionary algorithm. The authors proposed the use of a grammar-based GP algorithm for building and evolving individuals which are, in fact, rule induction algorithms. That approach successfully employs EAs to evolve a generic rule induction algorithm, which can then be applied to solve many different classification problems, instead of evolving a specific set of rules tailored to a particular data set. As presented in Chap. 3, in the area of optimisation this type of approach is named hyper-heuristics (HHs) [5, 6]. HHs are search methods for automatically selecting and combining simpler heuristics, resulting in a generic heuristic that is used to solve any instance of a given optimisation problem. For instance, a HH can generate a generic heuristic for solving any instance of the timetabling problem (i.e., allocation of any number of resources subject to any set of constraints in any schedule configuration) whilst a conventional EA would just evolve a solution to one particular instance of the timetabling problem (i.e., a predefined set of resources and constraints in a given schedule configuration). In this chapter, we present a hyper-heuristic strategy for automatically designing decision-tree induction algorithms, namely HEAD-DT (Hyper-Heuristic Evolutionary Algorithm for Automatically Designing Decision-Tree Algorithms). Section 4.1 introduces HEAD-DT and its evolutionary scheme. Section 4.2 presents the individual representation adopted by HEAD-DT to evolve decision-tree algorithms, as well as information regarding each individual’s gene. Section 4.3 shows the evolutionary cycle of HEAD-DT, detailing its genetic operators. Section 4.4 depicts the fitness evaluation process in HEAD-DT, and introduces two possible frameworks for executing HEAD-DT. Section 4.5 computes the total size of the search space that HEAD-DT is capable of traversing, whereas Sect. 4.6 discusses related work.
Rodrigo C. Barros, André C. P. L. F. de Carvalho, Alex A. Freitas
Chapter 5. HEAD-DT: Experimental Analysis
Abstract
In this chapter, we present several empirical analyses that assess the performance of HEAD-DT in different scenarios. We divide these analyses into two sets of experiments, according to the meta-training strategy employed for automatically designing the decision-tree algorithms. As mentioned in Chap. 4, HEAD-DT can operate in two distinct frameworks: (i) evolving a decision-tree induction algorithm tailored to one specific data set (specific framework); or (ii) evolving a decision-tree induction algorithm from multiple data sets (general framework). The specific framework provides data from a single data set to HEAD-DT for both algorithm design (evolution) and performance assessment. The experiments conducted for this scenario (see Sect. 5.1) make use of public data sets that do not share a common application domain. In the general framework, distinct data sets are used for algorithm design and performance assessment. In this scenario (see Sect. 5.2), we conduct two types of experiments, namely the homogeneous approach and the heterogeneous approach. In the homogeneous approach, we analyse whether automatically designing a decision-tree algorithm for a particular domain provides good results. More specifically, the data sets that feed HEAD-DT during evolution, and also those employed for performance assessment, share a common application domain. In the heterogeneous approach, we investigate whether HEAD-DT is capable of generating an algorithm that performs well across a variety of different data sets, regardless of their particular characteristics or application domain. We also discuss about the theoretic and empirical time complexity of HEAD-DT in Sect. 5.3, and we make a brief discussion on the cost-effectiveness of automated algorithm design in Sect. 5.4. We present examples of algorithms which were automatically designed by HEAD-DT in Sect. 5.5. We conclude the experimental analysis by empirically verifying in Sect. 5.6 whether the genetic search is worthwhile.
Rodrigo C. Barros, André C. P. L. F. de Carvalho, Alex A. Freitas
Chapter 6. HEAD-DT: Fitness Function Analysis
Abstract
In Chap. 4, more specifically in Sect. 4.​4, we saw that the definition of a fitness function for the scenario in which HEAD-DT evolves a decision-tree algorithm from multiple data sets is an interesting and relevant problem. In the experiments presented in Chap. 5, Sect. 5.​2, we employed a simple average over the F-Measure obtained in the data sets that belong to the meta-training set. As previously observed, when evolving an algorithm from multiple data sets, each individual of HEAD-DT has to be executed over each data set in the meta-training set. Hence, instead of obtaining a single value of predictive performance, each individual scores a set of values that have to be eventually combined into a single measure. In this chapter, we analyse in more detail the impact of different strategies to be used as fitness function during the evolutionary cycle of HEAD-DT. We divide the experimental scheme into two distinct scenarios: (i) evolving a decision-tree induction algorithm from multiple balanced data sets; and (ii) evolving a decision-tree induction algorithm from multiple imbalanced data sets. In each of these scenarios, we analyse the difference in performance of well-known performance measures such as accuracy, F-Measure, AUC, recall, and also a lesser-known criterion, namely the relative accuracy improvement. In addition, we analyse different schemes of aggregation, such as simple average, median, and harmonic mean.
Rodrigo C. Barros, André C. P. L. F. de Carvalho, Alex A. Freitas
Chapter 7. Conclusions
Abstract
We presented in this book an approach for the automatic design of decision-tree induction algorithms, namely HEAD-DT (Hyper-Heuristic Evolutionary Algorithm for Automatically Designing Decision-Tree Induction Algorithms).
Rodrigo C. Barros, André C. P. L. F. de Carvalho, Alex A. Freitas
Metadaten
Titel
Automatic Design of Decision-Tree Induction Algorithms
verfasst von
Rodrigo C. Barros
André C.P.L.F de Carvalho
Alex A. Freitas
Copyright-Jahr
2015
Electronic ISBN
978-3-319-14231-9
Print ISBN
978-3-319-14230-2
DOI
https://doi.org/10.1007/978-3-319-14231-9