Skip to main content

2016 | Buch

Multilabel Classification

Problem Analysis, Metrics and Techniques

verfasst von: Francisco Herrera, Francisco Charte, Antonio J. Rivera, María J. del Jesus

Verlag: Springer International Publishing

insite
SUCHEN

Über dieses Buch

This book offers a comprehensive review of multilabel techniques widely used to classify and label texts, pictures, videos and music in the Internet. A deep review of the specialized literature on the field includes the available software needed to work with this kind of data. It provides the user with the software tools needed to deal with multilabel data, as well as step by step instruction on how to use them. The main topics covered are:
• The special characteristics of multi-labeled data and the metrics available to measure them.• The importance of taking advantage of label correlations to improve the results.• The different approaches followed to face multi-label classification.• The preprocessing techniques applicable to multi-label datasets.• The available software tools to work with multi-label data.
This book is beneficial for professionals and researchers in a variety of fields because of the wide range of potential applications for multilabel classification. Besides its multiple applications to classify different types of online information, it is also useful in many other areas, such as genomics and biology. No previous knowledge about the subject is required. The book introduces all the needed concepts to understand multilabel data characterization, treatment and evaluation.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
This book is focused on multilabel classification and related topics. Multilabel classification is one specific type of classification, classification being one of the usual tasks in the data mining field. Data mining itself can be seen as a step into a broad process, the discovery of new knowledge from databases. The goal of this first chapter is to introduce all these concepts, aiming to set the working context for the topics covered in the following ones. A global outline to this respect is given in Sect. 1.1. Section 1.2 provides an overview of the whole Knowledge Discovery in Databases process. Section 1.3 introduces the essential preprocessing tasks. Then, the different learning styles in use nowadays are explained in Sect. 1.4, and lastly multilabel classification is introduced in comparison with other traditional types of classification in Sect. 1.5.
Francisco Herrera, Francisco Charte, Antonio J. Rivera, María J. del Jesus
Chapter 2. Multilabel Classification
Abstract
This book is concerned with the classification of multilabeled data and other tasks related to that subject. The goal of this chapter is to formally introduce the problem, as well as to give a broad overview of its main application fields and how it have been tackled by experts. A general introduction to the matter is provided in Sect. 2.1, followed by a formal definition of the multilabel classification problem in Sect. 2.2. Some of the main application fields of multilabel classification are portrayed in Sect. 2.3. Lastly, the approaches followed to face this duty are introduced in Sect. 2.4.
Francisco Herrera, Francisco Charte, Antonio J. Rivera, María J. del Jesus
Chapter 3. Case Studies and Metrics
Abstract
Multilabel classification techniques have been applied in many real-world situations in the last two decades. Each one represents a different case study for MLC, using one or more MLDs. After the general overview provided in Sect. 3.1, this chapter begins by briefly describing in Sect. 3.2 the most usual case studies found in the literature. As a result, a full list of available MLDs will be obtained, and the usual characterization metrics are explained and put in use with them in Sect. 3.3. Then, a practical use case is detailed in Sect. 3.4, running a simple MLC algorithm over a few MLDs. Lastly, the usual performance evaluation metrics for MLC are introduced in Sect. 3.5 and they are used to analyze the results obtained from this experiment.
Francisco Herrera, Francisco Charte, Antonio J. Rivera, María J. del Jesus
Chapter 4. Transformation-Based Classifiers
Abstract
One of the first approaches to accomplish multilabel classification was based on data transformation techniques. These are aimed to produce binary or multiclass datasets from the multilabel original ones, thus allowing the use of traditional classification algorithms to solve the problem. The goal of this chapter is to introduce the most relevant transformation-based MLC methods, as well as to experimentally test the most popular ones. Section 4.1 provides a broad introduction to the chapter contents. The main data transformation approaches are defined in Sect. 4.2; then, several methods based on each approach are described in Sects. 4.3 and 4.4. Four of these methods are experimentally tested in Sect. 4.5. Section 4.6 summarizes the chapter.
Francisco Herrera, Francisco Charte, Antonio J. Rivera, María J. del Jesus
Chapter 5. Adaptation-Based Classifiers
Abstract
While data transformation is a relatively straightforward way to do multilabel classification through traditional classifiers, an alternative approach based on adapting those classifiers to tackle the original multilabeled data also has been also explored. This chapter aims to introduce many of these method adaptations. Most of them rely on traditional algorithms based on the trees, neural networks, instance-based learning, etc. A general overview of them is provided in Sect. 5.1. Then, about thirty different proposals are detailed in Sects. 5.25.7, grouped according to the type of model they are founded on. A selection of four algorithms are experimentally tested in Sect. 5.8. Some final remarks are provided in Sect. 5.9.
Francisco Herrera, Francisco Charte, Antonio J. Rivera, María J. del Jesus
Chapter 6. Ensemble-Based Classifiers
Abstract
Classification methods founded on training several models with a certain heterogeneity degree, and then aggregating their predictions according to a particular strategy tends to be a very effective solution. Ensembles have been also used to tackle some specific obstacles, such as imbalanced class distribution. The goal in this chapter is to present several multilabel ensemble-based solutions. Section 6.1 introduces this approach. Ensembles of binary classifiers are described in Sect. 6.2, while those based on multiclass methods are outlined in Sect. 6.3. Other kinds of ensembles will be briefly portrayed in Sect. 6.4. Some of these solutions are experimentally tested in Sect. 6.5, analyzing their predictive performance and running time. Lastly, Sect. 6.6 summarizes the chapter.
Francisco Herrera, Francisco Charte, Antonio J. Rivera, María J. del Jesus
Chapter 7. Dimensionality Reduction
Abstract
High dimensionality is a profoundly studied problem in machine learning. Usually, a high-dimensional input space defies most classification algorithms, tending to produce more complex and less effective models. Multilabel data are also affected by high dimensionality in the output space, since many datasets have hundreds or even thousands of labels. This chapter aims to explain how high dimensionality affects multilabel classification, as well as the methods proposed to deal with this obstacle. A general overview of the curse of dimensionality in the multilabel field is provided in Sect. 7.1. Section 7.2 introduces feature space reduction techniques, outlining several specific proposals and testing how applying feature selection impacts multilabel classifiers results. Then, a similar discussion but related to label space dimensionality is given in Sect. 7.3, also including some experimental results. Section 7.4 summarizes the chapter.
Francisco Herrera, Francisco Charte, Antonio J. Rivera, María J. del Jesus
Chapter 8. Imbalance in Multilabel Datasets
Abstract
The frequency of class labels in many datasets is not even. On the contrary, that a certain class appears in a large portion of the data samples while other is scarcely represented is something quite usual. This casuistic produces a problem generically labeled as class imbalance. Due to these differences between class distributions, a specific need arises, imbalanced learning. This chapter beings introducing the mentioned task in Sect. 8.1. Then, the specific aspects of imbalance in the multilabel area are discussed in Sect. 8.2. Section 8.3 explains how imbalance in MLC has been faced, enumerating a considerable set of proposals. Some of them are experimentally evaluated in Sect. 8.4. Lastly, Sect. 8.5 summarizes the contents.
Francisco Herrera, Francisco Charte, Antonio J. Rivera, María J. del Jesus
Chapter 9. Multilabel Software
Abstract
Multilabel classification and other learning from multilabeled data tasks are relatively recent, with barely a decade of history behind them. When compared against binary and multiclass learning, the range of available datasets, frameworks, and other software tools is significantly more scarce. The goal of this last chapter is to provide the reader with the proper insight to take advantage of these software tools. A brief overview of them is offered in Sect. 9.1. Section 9.2 discusses the different multilabel file formats, enumerates the data repositories the MLDs can be downloaded from, and describes how to automate some tasks with the mldr.datasets R package. How to perform exploratory data analysis of MLDs is the main topic of Sect. 9.3. Then, the process to conduct experiments with multilabel data using different tools is outlined in Sect. 9.4.
Francisco Herrera, Francisco Charte, Antonio J. Rivera, María J. del Jesus
Backmatter
Metadaten
Titel
Multilabel Classification
verfasst von
Francisco Herrera
Francisco Charte
Antonio J. Rivera
María J. del Jesus
Copyright-Jahr
2016
Electronic ISBN
978-3-319-41111-8
Print ISBN
978-3-319-41110-1
DOI
https://doi.org/10.1007/978-3-319-41111-8