nach oben

Open Access 2022 | Open Access | Buch

Metalearning

Applications to Automated Machine Learning and Data Mining

verfasst von: Prof. Pavel Brazdil, Dr. Jan N. van Rijn, Dr. Carlos Soares, Dr. Joaquin Vanschoren

Verlag: Springer International Publishing

Buchreihe : Cognitive Technologies

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Inhaltsverzeichnis

Über dieses Buch

This open access book as one of the fastest-growing areas of research in machine learning, metalearning studies principled methods to obtain efficient models and solutions by adapting machine learning and data mining processes. This adaptation usually exploits information from past experience on other tasks and the adaptive processes can involve machine learning approaches. As a related area to metalearning and a hot topic currently, automated machine learning (AutoML) is concerned with automating the machine learning processes. Metalearning and AutoML can help AI learn to control the application of different learning methods and acquire new solutions faster without unnecessary interventions from the user.

This book offers a comprehensive and thorough introduction to almost all aspects of metalearning and AutoML, covering the basic concepts and architecture, evaluation, datasets, hyperparameter optimization, ensembles and workflows, and also how this knowledge can be used to select, combine, compose, adapt and configure both algorithms and models to yield faster and better solutions to data mining and data science problems. It can thus help developers to develop systems that can improve themselves through experience.

This book is a substantial update of the first edition published in 2009. It includes 18 chapters, more than twice as much as the previous version. This enabled the authors to cover the most relevant topics in more depth and incorporate the overview of recent research in the respective area. The book will be of interest to researchers and graduate students in the areas of machine learning, data mining, data science and artificial intelligence.

Inhaltsverzeichnis

Frontmatter

PDF

Basic Concepts and Architecture

Frontmatter

Open Access

Chapter 1. Introduction

Summary

This chapter starts by describing the organization of the book, which consists of three parts. Part I discusses some basic concepts, including, for instance, what metalearning is and how it is related to automatic machine learning (AutoML). This continues with a presentation of the basic architecture of metalearning/AutoML systems, discussion of systems that exploit algorithm selection using prior metadata, methodology used in their evaluation, and different types of meta-level models, while mentioning the respective chapters where more details can be found. This part also includes discussion of methods used for hyperparameter optimization and workflow design. Part II includes the discussion of more advanced techniques and methods. The first chapter discusses the problem of setting up configuration spaces and conducting experiments. Subsequent chapters discuss different types of ensembles, metalearning in ensemble methods, algorithms used for data streams and transfer of meta-models across tasks. One chapter is dedicated to metalearning for deep neural networks. The last two chapters discuss the problem of automating various data science tasks and trying to design systems that are more complex. Part III is relatively short. It discusses repositories of metadata (including experimental results) and exemplifies what can be learned from this metadata by giving illustrative examples. The final chapter presents concluding remarks.

Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren

PDF

Open Access

Chapter 2. Metalearning Approaches for Algorithm Selection I (Exploiting Rankings)

Summary

This chapter discusses an approach to the problem of algorithm selection, which exploits the performance metadata of algorithms (workflows) on prior tasks to generate recommendations for a given target dataset. The recommendations are in the form of rankings of candidate algorithms. The methodology involves two phases. In the first one, rankings of algorithms/workflows are elaborated on the basis of historical performance data on different datasets. These are subsequently aggregated into a single ranking (e.g. average ranking). In the second phase, the average ranking is used to schedule tests on the target dataset with the objective of identifying the best performing algorithm. This approach requires that an appropriate evaluation measure, such as accuracy, is set beforehand. In this chapter we also describe a method that builds this ranking based on a combination of accuracy and runtime, yielding good anytime performance. While this approach is rather simple, it can still provide good recommendations to the user. Although the examples in this chapter are from the classification domain, this approach can be applied to other tasks besides algorithm selection, namely hyperparameter optimization (HPO), as well as the combined algorithm selection and hyperparameter optimization (CASH) problem. As this approach works with discrete data, continuous hyperparameters need to be discretized first.

Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren

PDF

Open Access

Chapter 3. Evaluating Recommendations of Metalearning/AutoML Systems

Abstract

This chapter discusses some typical approaches that are commonly used to evaluate metalearning and AutoML systems. This helps us to establish whether we can trust the recommendations provided by a particular system, and also provides a way of comparing different competing approaches. As the performance of algorithms may vary substantially across different tasks, it is often necessary to normalize the performance values first to make comparisons meaningful. This chapter discusses some common normalization methods used. As often a given metalearning system outputs a sequence of algorithms to test, we can study how similar this sequence is from the ideal sequence. This can be determined by looking at a degree of correlation between the two sequences. This chapter provides more details on this issue. One common way of comparing systems is by considering the effect of selecting different algorithms (workflows) on base-level performance and determining how the performance evolves with time. If the ideal performance is known, it is possible to calculate the value of performance loss. The loss curve shows how the loss evolves with time or what its value is at the maximum available time (i.e., the time budget) given beforehand. This chapter also describes the methodology that is commonly used in comparisons involving several metalearning/AutoML systems with recourse to statistical tests.

Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren

PDF

Open Access

Chapter 4. Dataset Characteristics (Metafeatures)

Summary

This chapter discusses dataset characteristics that play a crucial role in many metalearning systems. Typically, they help to restrict the search in a given configuration space. The basic characteristic of the target variable, for instance, determines the choice of the right approach. If it is numeric, it suggests that a suitable regression algorithm should be used, while if it is categorical, a classification algorithm should be used instead. This chapter provides an overview of different types of dataset characteristics, which are sometimes also referred to as metafeatures. These are of different types, and include so-called simple, statistical, information-theoretic, model-based, complexitybased, and performance-based metafeatures. The last group of characteristics has the advantage that it can be easily defined in any domain. These characteristics include, for instance, sampling landmarkers representing the performance of particular algorithms on samples of data, relative landmarkers capturing differences or ratios of performance values and providing estimates of performance gains. The final part of this chapter discusses the specific dataset characteristics used in different machine learning tasks, including classification, regression, time series, and clustering.

Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren

PDF

Open Access

Chapter 5. Metalearning Approaches for Algorithm Selection II

Summary

This chapter discusses different types of metalearning models, including regression, classification and relative performance models. Regression models use a suitable regression algorithm, which is trained on the metadata and used to predict the performance of given base-level algorithms. The predictions can in turn be used to order the base-level algorithms and hence identify the best one. These models also play an important role in the search for the potentially best hyperparameter configuration discussed in the next chapter. Classification models identify which base-level algorithms are applicable or non-applicable to the target classification task. Probabilistic classifiers can be used to construct a ranking of potentially useful alternatives. Relative performance models exploit information regarding the relative performance of base-level models, which can be either in the form of rankings or pairwise comparisons. This chapter discusses various methods that use this information in the search for the potentially best algorithm for the target task.

Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren

PDF

Open Access

Chapter 6. Metalearning for Hyperparameter Optimization

Summary

This chapter describes various approaches for the hyperparameter optimization (HPO) and combined algorithm selection and hyperparameter optimization problems (CASH). It starts by presenting some basic hyperparameter optimization methods, including grid search, random search, racing strategies, successive halving and hyperband. Next, it discusses Bayesian optimization, a technique that learns from the observed performance of previously tried hyperparameter settings on the current task. This knowledge is used to build a meta-model (surrogate model) that can be used to predict which unseen configurations may work better on that task. This part includes the description sequential model-based optimization (SMBO). This chapter also covers metalearning techniques that extend the previously discussed optimization techniques with the ability to transfer knowledge across tasks. This includes techniques such as warm-starting the search, or transferring previously learned meta-models that were trained on prior (similar) tasks. A key question here is how to establish how similar prior tasks are to the new task. This can be done on the basis of past experiments, but can also exploit the information gained from recent experiments on the target task. This chapter presents an overview of some recent methods proposed in this area.

Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren

PDF

Open Access

Chapter 7. Automating Workflow/Pipeline Design

Summary

This chapter discusses the design of workflows (or pipelines), which represent solutions that involve more than one algorithm. This is motivated by the fact that many tasks require such solutions. This problem is non-trivial, as the number of possible workflows (and their configurations) can be rather large. This chapter discusses various methods that can be used to restrict the design options and thus reduce the size of the configuration space. These include, for instance, ontologies and context-free grammars. Each of these formalisms has its merits and shortcomings. Many platforms have resorted to planning systems that use operators. These can be designed to be in accordance with the given ontologies or grammars. As the search space may be rather large, it is important to leverage prior experience. This topic is addressed in one of the sections, which discusses rankings of plans that have proved to be useful in the past. The workflows/pipelines that have proved successful in the past can be retrieved and used as plans in future tasks. Thus, it is possible to exploit both planning and metalearning.

Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren

PDF

Advanced Techniques and Methods

Frontmatter

Open Access

Chapter 8. Setting Up Configuration Spaces and Experiments

Summary

This chapter discusses the issues relative to so-called configuration spaces that need to be set up before initiating the search for a solution. It starts by introducing some basic concepts, such as discrete and continuous subspaces. Then it discusses certain criteria that help us to determine whether the given configuration space is (or is not) adequate for the tasks at hand. One important topic which is addressed here is hyperparameter importance, as it helps us to determine which hyperparameters have a high influence on the performance and should therefore be optimized. This chapter also discusses some methods for reducing the configuration space. This is important as it can speed up the process of finding the potentially best workflow for the new task. One problem that current systems face nowadays is that the number of alternatives in a given configuration space can be so large that it is virtually impossible to gather complete metadata. This chapter discusses the issue of whether the system can still function satisfactorily even when the metadata is incomplete. The final part of this chapter discusses some strategies that can be used for gathering metadata that originated in the area of multi-armed bandits, including, for instance, SoftMax, upper confidence bound (UCB) and pricing strategies.

Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren

PDF

Open Access

Chapter 9. Combining Base-Learners into Ensembles

Abstract

This chapter discusses ensembles of classification or regression models, because they represent an important area of machine learning. They have become popular as they tend to achieve high performance when compared with single models. Besides, they also play an essential role in data-streaming solutions. This chapter starts by introducing ensemble learning and presents an overview of some of its most well-known methods. These include bagging, boosting, stacking, cascade generalization, cascading, delegating, arbitrating and meta-decision trees.

Christophe Giraud-Carrier

PDF

Open Access

Chapter 10. Metalearning in Ensemble Methods

Abstract

This chapter discusses some approaches that exploit metalearning methods in ensemble learning. It starts by presenting a set of issues, such as the ensemble method used, which affect the process of ensemble learning and the resulting ensemble. In this chapter we discuss various lines of research that were followed. Some approaches seek an ensemble-based solution for the whole dataset, others for individual instances. Regarding the first group, we focus on metalearning in the construction, pruning and integration phase. Modeling the interdependence of models plays an important part in this process. In the second group, the dynamic selection of models is carried out for each instance. A separate section is dedicated to hierarchical ensembles and some methods used in their design. As this area involves potentially very large configuration spaces, recourse to advanced methods, including metalearning, is advantageous. It can be exploited to define the competence regions of different models and the dependencies between them.

Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren

PDF

Open Access

Chapter 11. Algorithm Recommendation for Data Streams

Abstract

This chapter focuses on metalearning approaches that have been applied to data streams. This is an important area, as many real-world data arrive in the form of a stream of observations. We first review some important aspects of the data stream setting, which may involve online learning, non-stationarity, and concept drift.

Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren

PDF

Open Access

Chapter 12. Transfer of Knowledge Across Tasks

Abstract

This area is often referred to as transfer of knowledge across tasks, or simply transfer learning; it aims at developing learning algorithms that leverage the results of previous learning tasks. This chapter discusses different approaches in transfer learning, such as representational transfer, where transfer takes place after one or more source models have been trained. There is an explicit form of knowledge transferred directly to the target model or to the meta-model. The chapter also discusses functional transfer, where two or more models are trained simultaneously. This situation is sometimes referred to as multi-task learning. In this approach, the models share their internal structure (or possibly some parts) during learning. Other topics include instance-, feature-, and parameter-based transfer learning, often used to initialize the search on the target domain. A distinct topic is transfer learning in neural networks, which includes, for instance, the transfer of a part of the network structure. The chapter also presents the double loop architecture, where the base-learner iterates over the training set in an inner loop, while the metalearner iterates over different tasks to learn metaparameters in an outer loop. Details are given on transfer learning within kernel methods and parametric Bayesian models.

Ricardo Vilalta, Mikhail M. Meskhi

PDF

Open Access

Chapter 13. Metalearning for Deep Neural Networks

Abstract

Deep neural networks have enabled large breakthroughs in various domains ranging from image and speech recognition to automated medical diagnosis. However, these networks are notorious for requiring large amounts of data to learn from, limiting their applicability in domains where data is scarce. Through metalearning, the networks can learn how to learn, allowing them to learn from fewer data. In this chapter, we provide a detailed overview of metalearning for knowledge transfer in deep neural networks. We categorize the techniques into (i) metric-based, (ii) model-based, and (iii) optimization-based techniques, cover the key techniques per category, discuss open challenges, and provide directions for future research such as performance evaluation on heterogeneous benchmarks.

Mike Huisman, Jan N. van Rijn, Aske Plaat

PDF

Open Access

Chapter 14. Automating Data Science

Abstract

It has been observed that, in data science, a great part of the effort usually goes into various preparatory steps that precede model-building. The aim of this chapter is to focus on some of these steps. A comprehensive description of a given task to be resolved is usually supplied by the domain expert. Techniques exist that can process natural language description to obtain task descriptors (e.g., keywords), determine the task type, the domain, and the goals. This in turn can be used to search for the required domain-specific knowledge appropriate for the given task. In some situations, the data required may not be available and a plan needs to be elaborated regarding how to get it. Although not much research has been done in this area so far, we expect that progress will be made in the future. In contrast to this, the area of preprocessing and transformation has been explored by various researchers. Methods exist for selection of instances and/or elimination of outliers, discretization and other kinds of transformations. This area is sometimes referred to as data wrangling. These transformations can be learned by exploiting existing machine learning techniques (e.g., learning by demonstration). The final part of this chapter discusses decisions regarding the appropriate level of detail (granularity) to be used in a given task. Although it is foreseeable that further progress could be made in this area, more work is needed to determine how to do this effectively.

Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren

PDF

Open Access

Chapter 15. Automating the Design of Complex Systems

Abstract

This chapter discusses the issue of whether it is possible to automate the design of rather complex workflows needed when addressing more complex data science tasks. The focus here is on symbolic approaches, which continue to be relevant. The chapter starts by discussing some more complex operators, including, for instance, conditional operators and operators used in iterative processing. Next, we discuss the issue of introduction of new concepts and the changes of granularity that can be achieved as a result. We review various approaches explored in the past, such as constructive induction, propositionalization, reformulation of rules, among others, but also draw attention to some new advances, such as feature construction in deep NNs. It is foreseeable that in the future both symbolic and subsymbolic approaches will coexist in systems exhibiting a kind of functional symbiosis. There are tasks that cannot be learned in one go, but rather require a sub-division into subtasks, a plan for learning the constituents, and joining the parts together. Some of these subtasks may be interdependent. Some tasks may require an iterative process in the process of learning. This chapter discusses various examples that can stimulate both further research and some practical solutions in this rather challenging area.

Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren

PDF

Organizing and Exploiting Metadata

Frontmatter

Open Access

Chapter 16. Metadata Repositories

Summary

This chapter presents a review of online repositories where researchers can share data, code, and experiments. In particular, it covers OpenML, an online platform for sharing and organizing machine learning data automatically. OpenML contains thousands of datasets and algorithms, and millions of experimental results. We describe the basic philosophy involved, and its basic components: datasets, tasks, flows, setups, runs, and benchmark suites. OpenML has API bindings in various programming languages, making it easy for users to interact with the API in their native language. One important feature of OpenML is the integration into various machine learning toolboxes, such as Scikit-learn, Weka, and mlR. Users of these toolboxes can automatically upload all their results, leading to a large repository of experimental results.

Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren

PDF

Open Access

Chapter 17. Learning from Metadata in Repositories

Abstract

This chapter describes the various types of experiments that can be done with the vast amount of data, stored in experiment databases. We focus on three types of experiments done with the data stored in OpenML.

Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren

PDF

Open Access

Chapter 18. Concluding Remarks

Summary

As metaknowledge has a central role in many approaches discussed in this book, we address the issue of what kind of metaknowledge is used in different metalearning/AutoML tasks, such as algorithm selection, hypeparameter optimization, and workflow generation. We draw attention to the fact that some metaknowledge is acquired (learned) by the systems, while other is given (e.g., different aspects of the given configuration space). This chapter continues by discussing future challenges, such as how to achieve better integration of metalearning and AutoML approaches, and what kind of guidance could be provided by the system when configuring metalearning/AutoML systems to new settings. This task may involve (semi-)automatic reduction of configuration spaces to make the search more effective. The last part of this chapter discusses various challenges encountered when trying to automate different steps of data science.

Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren

PDF

Backmatter

PDF

Titel: Metalearning
verfasst von: Prof. Pavel Brazdil
Dr. Jan N. van Rijn
Dr. Carlos Soares
Dr. Joaquin Vanschoren
Verlag: Springer International Publishing
Electronic ISBN: 978-3-030-67024-5
Print ISBN: 978-3-030-67023-8
DOI: https://doi.org/10.1007/978-3-030-67024-5