Skip to main content

2021 | Buch

Machine Learning

insite
SUCHEN

Über dieses Buch

Machine Learning, a vital and core area of artificial intelligence (AI), is propelling the AI field ever further and making it one of the most compelling areas of computer science research. This textbook offers a comprehensive and unbiased introduction to almost all aspects of machine learning, from the fundamentals to advanced topics. It consists of 16 chapters divided into three parts: Part 1 (Chapters 1-3) introduces the fundamentals of machine learning, including terminology, basic principles, evaluation, and linear models; Part 2 (Chapters 4-10) presents classic and commonly used machine learning methods, such as decision trees, neural networks, support vector machines, Bayesian classifiers, ensemble methods, clustering, dimension reduction and metric learning; Part 3 (Chapters 11-16) introduces some advanced topics, covering feature selection and sparse learning, computational learning theory, semi-supervised learning, probabilistic graphical models, rule learning, and reinforcement learning. Each chapter includes exercises and further reading, so that readers can explore areas of interest.

The book can be used as an undergraduate or postgraduate textbook for computer science, computer engineering, electrical engineering, data science, and related majors. It is also a useful reference resource for researchers and practitioners of machine learning.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
Following a drizzling, we take a walk on the wet street. Feeling the gentle breeze and seeing the sunset glow, we bet the weather must be nice tomorrow. Walking to a fruit stand, we pick up a green watermelon with curly root and muffled sound; while hoping the watermelon is ripe, we also expect some good academic marks this semester after all the hard work on studies. We wish readers to share the same confidence in their studies, but to begin with, let us take an informal discussion on what is machine learning.
Zhi-Hua Zhou
Chapter 2. Model Selection and Evaluation
Abstract
In general, the proportion of incorrectly classified samples to the total number of samples is called error rate, that is, if a out of m samples are misclassified, then the error rate is \(E=a/m\).
Zhi-Hua Zhou
Chapter 3. Linear Models
Abstract
Let \(\boldsymbol{x} = (x_1;x_2;\ldots ;x_d)\) be a sample described by d variables, where \(\boldsymbol{x}\) takes the value \(x_i\) on the ith variable.
Zhi-Hua Zhou
Chapter 4. Decision Trees
Abstract
Decision trees are a popular class of machine learning methods. Taking binary classification as an example, we can regard the task as deciding the answer to the question Is this instance positive? As the name suggests, a decision tree makes decisions based on tree structures, which is also a common decision-making mechanism used by humans. For example, in order to answer the question Is this watermelon ripe? we usually go through a series of judgments or sub-decisions: we first consider What is the color? If it is green then What is the shape of root? If it is curly then What is the knocking sound? Finally, based on the observations, we decide whether the watermelon is ripe or not. Such a decision process is illustrated in Figure 4.1.
Zhi-Hua Zhou
Chapter 5. Neural Networks
Abstract
Research on neural networks started quite a long time ago, and it has become a broad and interdisciplinary research field today. Though neural networks have various definitions across disciplines, this book uses a widely adopted one: “Artificial neural networks are massively parallel interconnected networks of simple (usually adaptive) elements and their hierarchical organizations which are intended to interact with the objects of the real world in the same way as biological nervous systems do” (Kohonen 1988). In the context of machine learning, neural networks refer to “neural networks learning”, or in other words, the intersection of machine learning research and neural networks research.
Zhi-Hua Zhou
Chapter 6. Support Vector Machine
Abstract
Given a training set \(D = \{(\boldsymbol{x}_1, y_1), (\boldsymbol{x}_2, y_2), \ldots , (\boldsymbol{x}_m, y_m)\}\), where \(y_i\in \{-1,+1\}\). The basic idea of classification is to utilize the training set D to find a separating hyperplane in the sample space that can separate samples of different classes. However, there could be multiple qualified separating hyperplanes, as shown in Figure 6.1, which one should be chosen?
Zhi-Hua Zhou
Chapter 7. Bayes Classifiers
Abstract
Bayesian decision theory is a fundamental decision-making approach under the probability framework. In an ideal situation when all relevant probabilities were known, Bayesian decision theory makes optimal classification decisions based on the probabilities and costs of misclassifications. In the following, we demonstrate the basic idea of Bayesian decision theory with multiclass classification.
Zhi-Hua Zhou
Chapter 8. Ensemble Learning
Abstract
Ensemble learning, also known as multiple classifier system and committee-based learning, trains and combines multiple learners to solve a learning problem.
Zhi-Hua Zhou
Chapter 9. Clustering
Abstract
Unsupervised learning aims to discover underlying properties and patterns from unlabeled training samples and lays the foundation for further data analysis. Among various unsupervised learning techniques, the most researched and applied one is clustering.
Other unsupervised learning tasks include density estimation, anomaly detection, etc.
Zhi-Hua Zhou
Chapter 10. Dimensionality Reduction and Metric Learning
Abstract
k-Nearest Neighbor (kNN) is a commonly used supervised learning method with a simple mechanism: given a testing sample, find the k nearest training samples based on some distance metric, and then use these k ‘‘neighbors” to make predictions. Typically, for classification problems, voting can be used to predict the testing sample as the most frequent class label in the k neighbors; for regression problems, averaging can be used to predict the testing sample as the average of the k real-valued outputs. Besides, the samples can be weighted by the distances in the way that a closer sample is assigned a higher weight.
The principle of kNN agrees with the proverb that ‘‘One takes the behavior of one’s company.”
See Sect. 8.​4.
Zhi-Hua Zhou
Chapter 11. Feature Selection and Sparse Learning
Abstract
Watermelons can be described by many attributes, such as color, root, sound, texture, and surface, but experienced people can determine the ripeness with only the root and sound information. In other words, not all attributes are equally important for the learning task. In machine learning, attributes are also called features. Features that are useful for the current learning task are called relevant features, and those useless ones are called irrelevant features. The process of selecting relevant features from a given feature set is called feature selection.
Zhi-Hua Zhou
Chapter 12. Computational Learning Theory
Abstract
As the name suggests, computational learning theory is about ‘‘learning”’ by ‘‘computation” and is the theoretical foundation of machine learning. It aims to analyze the difficulties of learning problems, provides theoretical guarantees for learning algorithms, and guides the algorithm design based on theoretical analysis.
Zhi-Hua Zhou
Chapter 13. Semi-Supervised Learning
Abstract
We come to the watermelon field during the harvest season, and the ground is covered with many watermelons. The melon farmer brings a handful of melons and says that they are all ripe melons, and then points at a few melons in the ground and says that these are not ripe, and they would take a few more days to grow up. Based on this information, can we build a model to determine which melons in the field are ripe for picking? For sure, we can use the ripe and unripe watermelons told by the farmers as positive and negative samples to train a classifier. However, is it too few to use only a handful of melons as training samples? Can we use all the watermelons in the field as well?
Zhi-Hua Zhou
Chapter 14. Probabilistic Graphical Models
Abstract
The most important problem in machine learning is to estimate and infer the value of unknown variables (e.g., class label) based on the observed evidence (e.g., training samples). Probabilistic models provide a framework that considers learning problems as computing the probability distributions of variables.
Zhi-Hua Zhou
Chapter 15. Rule Learning
Abstract
In machine learning, rules usually refer to logic rules in the form of ‘‘if \(\ldots ,\) then \(\ldots \)” that can describe regular patterns or domain concepts with clear semantics (Fürnkranz et al. 2012). Rule learning is about learning a set of rules from training data for predicting unseen samples.
Broadly speaking, all predictive models can be seen as one or a set of rules. In rule learning, we refer to logic rules with the term ‘‘logicn” omitted.
Zhi-Hua Zhou
Chapter 16. Reinforcement Learning
Abstract
Planting watermelon involves many steps, such as seed selection, regular watering, fertilization, weeding, and insect control. We usually do not know the quality of the watermelons until harvesting. If we consider the harvesting of ripe watermelons as a reward for planting watermelons, then we do not receive the final reward immediately after each step of planting, e.g., fertilization. We do not even know the exact impact of the current action on the final reward. Instead, we only receive feedback about the current status, e.g., the watermelon seedling looks healthier.
Zhi-Hua Zhou
Backmatter
Metadaten
Titel
Machine Learning
verfasst von
Zhi-Hua Zhou
Copyright-Jahr
2021
Verlag
Springer Singapore
Electronic ISBN
978-981-15-1967-3
Print ISBN
978-981-15-1966-6
DOI
https://doi.org/10.1007/978-981-15-1967-3

Premium Partner