1. Metalearning: Concepts and Systems

Current data mining (DM) and machine learning (ML) tools are characterized by a plethora of algorithms but a lack of guidelines to select the right method according to the nature of the problem under analysis. Applications such as credit rating, medical diagnosis, mine-rock discrimination, fraud detection, and identification of objects in astronomical images generate thousands of instances for analysis with little or no additional information about the type of analysis technique most appropriate for the task at hand. Since real-world applications are generally time-sensitive, practitioners and researchers tend to use only a few available algorithms for data analysis, hoping that the set of assumptions embedded in these algorithms will match the characteristics of the data. Such practice in data mining and the application of machine learning has spurred the research community to investigate whether learning from data is made of a single operational layer — search for a good model that fits the data — or whether there are in fact several operational layers that can be exploited to produce an increase in performance over time. The latter alternative implies that it should be possible to learn about the learning process itself, and in particular that a system could learn to profit from previous experience to generate additional knowledge that can simplify the automatic selection of efficient models summarizing the data.

This book provides a review and analysis of a research direction in machine learning and data mining known as metalearning.¹ From a practical standpoint, the goal of metalearning is twofold. On the one hand, we wish to overcome some of the challenges faced by users with current data analysis tools. The aim here is to aid users in the task of selecting a suitable predictive model (or combination of models) while taking into account the domain of application. Without some kind of assistance, model selection and combination can turn into solid obstacles to end users who wish to access the technology more directly and cost-effectively. End users often lack not only the expertise necessary to select a suitable model, but also the availability of many models to proceed on a trial-and-error basis. A solution to this problem is attainable through the construction of metalearning systems that provide automatic and systematic user guidance by mapping a particular task to a suitable model (or combination of models).

2. Metalearning for Algorithm Recommendation: an Introduction

Data mining applications normally involve preparation of a dataset that can be processed by a learning algorithm (Figure 2.1). Given that there are usually several algorithms available, the user must select one of them. Additionally, most algorithms have parameters which must be set, so, after choosing the algorithm, the user must decide the values for each one of its parameters. The choice of algorithm is guided by some kind of metaknowledge, that is, knowledge that relates the characteristics of datasets with the performance of the available algorithms. This chapter describes how a simple metalearning system can be developed to generate metaknowledge that can be used to make recommendations concerning which algorithm to use on a given dataset. More details about various options are described in the next chapter.

As there are many alternative algorithms for a given task (for instance, decision trees, neural networks and support vector machines can be used for classification), the approach of trying out all alternatives and choosing the best one becomes infeasible. Although, normally, only a limited number of existing methods are available for use in a given application, the number of these methods may still be too large to rule out extensive experimentation. An approach followed by many users is to make some preselection of a small number of alternatives based on knowledge about the data and the available methods. The methods are applied to the dataset and the best one is normally chosen taking into account the results obtained. Although feasible, this approach may still require considerable computing time. Additionally, it requires that a highly skilled expert preselect the alternatives, and even the most skilled expert may sometimes fail, and so the best option may be left out.

3. Development of Metalearning Systems for Algorithm Recommendation

In the previous chapter, a metalearning approach to support the selection of learning algorithms was described. The approach was illustrated with a simple method that provides a recommendation concerning which algorithm to use on a given learning problem. The method predicts the relative performance of algorithms on a dataset based on their performance on datasets that were previously processed.

The development of metalearning systems for algorithm recommendation involves addressing several issues not only at the meta level (lower part of Figure 3.1) but also at the base level (top part of Figure 3.1). At the meta level, it is necessary, first of all, to choose the type of the target feature (or metatarget, for short), that is, the form of the recommendation that is provided to the user. In the system presented in the previous chapter, the form of recommendation adopted was rankings of base-algorithms. The type of metatarget determines the type of meta-algorithm, that is, the metalearning methods that can be used. This in turn determines the type of metaknowledge that can be obtained. The meta-algorithm described in the previous chapter was an adaptation of the k-nearest neighbors (k-NN) algorithm for ranking. The metatarget and the meta-algorithm are discussed in more detail in Section 3.2.

4. Extending Metalearning to Data Mining and KDD

Although a valid intellectual challenge in its own right, metalearning finds its real raison d'être in the practical support it offers Data Mining practitioners. The metaknowledge induced by metalearning provides the means to inform decisions about the precise conditions under which a given algorithm, or sequence of algorithms, is better than others for a given task. Without such knowledge, intelligent but uninformed practitioners faced with a new Data Mining task are limited to selecting the most suitable algorithm(s) by trial and error. With the large number of possible alternatives, an exhaustive search through the space of algorithms is impractical; and simply choosing the algorithm that somehow “appears” most promising is likely to yield subopti-mal solutions. Furthermore, the increased amount and detail of data available within organizations is leading to a demand for a much larger number of models, up to hundreds or even thousands, a situation leading to what has been referred to as Extreme Data Mining [96]. Current approaches to Data Mining remain largely dependent on human efforts and are thus not suitable for this kind of extreme setting because of the large amount of human resources required. Since metalearning can help reduce the need for human intervention, it may be expected to play a major role in these large-scale Data Mining applications. In this chapter, we describe some of the most significant attempts at integrating metaknowledge in Data Mining decision support systems.

While Data Mining software packages (e.g., Enterprise Miner,¹ Clemen-tine,² Insightful Miner,³ PolyAnalyst,⁴ KnowledgeStudio,⁵ We ka, ⁶ Rapid-Miner,⁷ Xelopes⁸) provide user-friendly access to rich collections of algorithms, they generally offer no real decision support to nonexpert end users. Similarly, tools with emphasis on advanced visualization (e.g., [121, 122]) help users understand the data (e.g., to select adequate transformations) and the models (e.g., to adjust parameters, compare results, and focus on specific parts of the model), but treat algorithm selection as an activity driven by the users rather than the system. The discussion in this chapter purposely leaves out such software packaging and visualization tools. The focus is strictly on systems that guide users by producing explicit advice automatically.

5. Extending Metalearning to Data Mining and KDD

Model combination consists of creating a single learning system from a collection of learning algorithms. In some sense, model combination may be viewed as a variation on the theme of combining data mining operations discussed in Chapter 4. There are two basic approaches to model combination. The first one exploits variability in the application's data and combines multiple copies of a single learning algorithm applied to different subsets of that data. The second one exploits variability among learning algorithms and combines several learning algorithms applied to the same application's data.

The main motivation for combining models is to reduce the probability of misclassification based on any single induced model by increasing the system's area of expertise through combination. Indeed, one of the implicit assumptions of model selection in metalearning is that there exists an optimal learning algorithm for each task. Although this clearly holds in the sense that, given a task ø and a set of learning algorithms {A _k}, there is a learning algorithm A _ø in {A _k} that performs better than all of the others on ø, the actual performance of A _ø may still be poor. In some cases, one may mitigate the risk of settling for a suboptimal learning algorithm by replacing single model selection with model combination.

6. Bias Management in Time-Changing Data Streams

Model combination consists of creating a single learning system from a collection of learning algorithms. In some sense, model combination may be viewed as a variation on the theme of combining data mining operations discussed in Chapter 4. There are two basic approaches to model combination. The first one exploits variability in the application's data and combines multiple copies of a single learning algorithm applied to different subsets of that data. The second one exploits variability among learning algorithms and combines several learning algorithms applied to the same application's data.

The main motivation for combining models is to reduce the probability of misclassification based on any single induced model by increasing the system's area of expertise through combination. Indeed, one of the implicit assumptions of model selection in metalearning is that there exists an optimal learning algorithm for each task. Although this clearly holds in the sense that, given a task ø and a set of learning algorithms {A _k}, there is a learning algorithm A _ø in {A _k} that performs better than all of the others on ø, the actual performance of A _ø may still be poor. In some cases, one may mitigate the risk of settling for a suboptimal learning algorithm by replacing single model selection with model combination.

Gama João, Gladys Castillo

7. Transfer of Metaknowledge Across Tasks

We have mentioned before that learning should not be viewed as an isolated task that starts from scratch with every new problem. Instead, a learning algorithm should exhibit the ability to adapt through a mechanism dedicated to transfer knowledge gathered from previous experience [258, 254, 206, 50]. The problem of transfer of metaknowledge is central to the field of learning to learn and is also known as inductive transfer. In this case, metaknowledge can be understood as a collection of patterns observed across tasks. One view of the nature of patterns across tasks is that of invariant transformations. For example, image recognition of a target object is simplified if the object is invariant under rotation, translation, scaling, etc. A learning system should be able to recognize a target object on an image even if previous images show the object in different sizes or from different angles. Hence, learning to learn studies how to improve learning by detecting, extracting, and exploiting metaknowledge in the form of invariant transformations across tasks.

In this chapter we take a look at various attempts to transfer metaknowledge across tasks. In its most common form, the process of inductive transfer maintains the learning algorithm unchanged (Sections 7.2.1–7.2.4), but the literature also presents more complex scenarios where the learning architecture itself evolves with experience according to a set of rules (Section 7.2.5). We present recent developments on the theoretical aspects of learning to learn (Section 7.3). We end our chapter by looking at practical challenges in knowledge transfer (Section 7.4).

8. Composition of Complex Systems: Role of Domain-Specific Metaknowledge

The aim of this chapter is to discuss the problem of employing learning methods in the design of complex systems. The term complex systems is used here to identify systems that cannot be learned in one step, but rather require several phases of learning. Our aim will be to show how domain-specific metaknowledge can be used to facilitate this task.

Springer Professional

Metalearning

Applications to Data Mining

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

1. Metalearning: Concepts and Systems

2. Metalearning for Algorithm Recommendation: an Introduction

3. Development of Metalearning Systems for Algorithm Recommendation

4. Extending Metalearning to Data Mining and KDD

5. Extending Metalearning to Data Mining and KDD

6. Bias Management in Time-Changing Data Streams

7. Transfer of Metaknowledge Across Tasks

8. Composition of Complex Systems: Role of Domain-Specific Metaknowledge

Backmatter

Premium Partner