Skip to main content

2022 | Buch

Deep Generative Modeling

insite
SUCHEN

Über dieses Buch

This textbook tackles the problem of formulating AI systems by combining probabilistic modeling and deep learning. Moreover, it goes beyond typical predictive modeling and brings together supervised learning and unsupervised learning. The resulting paradigm, called deep generative modeling, utilizes the generative perspective on perceiving the surrounding world. It assumes that each phenomenon is driven by an underlying generative process that defines a joint distribution over random variables and their stochastic interactions, i.e., how events occur and in what order. The adjective "deep" comes from the fact that the distribution is parameterized using deep neural networks. There are two distinct traits of deep generative modeling. First, the application of deep neural networks allows rich and flexible parameterization of distributions. Second, the principled manner of modeling stochastic dependencies using probability theory ensures rigorous formulation and prevents potential flaws in reasoning. Moreover, probability theory provides a unified framework where the likelihood function plays a crucial role in quantifying uncertainty and defining objective functions.

Deep Generative Modeling is designed to appeal to curious students, engineers, and researchers with a modest mathematical background in undergraduate calculus, linear algebra, probability theory, and the basics in machine learning, deep learning, and programming in Python and PyTorch (or other deep learning libraries). It will appeal to students and researchers from a variety of backgrounds, including computer science, engineering, data science, physics, and bioinformatics, who wish to become familiar with deep generative modeling. To engage the reader, the book introduces fundamental concepts with specific examples and code snippets. The full code accompanying the book is available on github.

The ultimate aim of the book is to outline the most important techniques in deep generative modeling and, eventually, enable readers to formulate new models and implement them.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Why Deep Generative Modeling?
Abstract
Before we start thinking about (deep) generative modeling, let us consider a simple example. Imagine we have trained a deep neural network that classifies images (\(\mathbf {x} \in \mathbb {Z}^{D}\)) of animals (\(y \in \mathcal {Y}\), and \(\mathcal {Y} = \{cat, dog, horse\}\)). Further, let us assume that this neural network is trained really well so that it always classifies a proper class with a high probability p(y|x). So far so good, right? The problem could occur though. As pointed out in Szegedy et al. (Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR 2014, 2014), adding noise to images could result in completely false classification. An example of such a situation is presented in Fig. 1.1 where adding noise could shift predicted probabilities of labels; however, the image is barely changed (at least to us, human beings).
Jakub M. Tomczak
Chapter 2. Autoregressive Models
Abstract
Before we start discussing how we can model the distribution p(x), we refresh our memory about the core rules of probability theory, namely, the sum rule and the product rule. Let us introduce two random variables x and y. Their joint distribution is p(x, y). The product rule allows us to factorize the joint distribution in two manners, namely:
$$\displaystyle p(\mathbf {x}, \mathbf {y}) = p(\mathbf {x} | \mathbf {y}) p(\mathbf {y})\\ = p(\mathbf {y} | \mathbf {x}) p(\mathbf {x}) . $$
Jakub M. Tomczak
Chapter 3. Flow-Based Models
Abstract
So far, we have discussed a class of deep generative models that model the distribution p(x) directly in an autoregressive manner. The main advantage of ARMs is that they can learn long-range statistics and, in a consequence, powerful density estimators. However, their drawback is that they are parameterized in an autoregressive manner, hence, sampling is rather a slow process. Moreover, they lack a latent representation, therefore, it is not obvious how to manipulate their internal data representation that makes it less appealing for tasks like compression or metric learning. In this chapter, we present a different approach to direct modeling of p(x). However, before we start our considerations, we will discuss a simple example.
Jakub M. Tomczak
Chapter 4. Latent Variable Models
Abstract
In the previous chapters, we discussed two approaches to learning p(x): autoregressive models (ARMs) in Chap. 2 and flow-based models (or flows for short) in Chap. 3. Both ARMs and flows model the likelihood function directly, that is, either by factorizing the distribution and parameterizing conditional distributions p(x d|x <d) as in ARMs or by utilizing invertible transformations (neural networks) for the change of variables formula as in flows. Now, we will discuss a third approach that introduces latent variables.
Jakub M. Tomczak
Chapter 5. Hybrid Modeling
Abstract
In Chap. 1, I tried to convince you that learning the conditional distribution p(y|x) is not enough and, instead, we should focus on the joint distribution p(x, y) factorized as follows:
$$\displaystyle p(\mathbf {x}, y) = p(y|\mathbf {x}) p(\mathbf {x}) . $$
Jakub M. Tomczak
Chapter 6. Energy-Based Models
Abstract
So far, we have discussed various deep generative models for modeling the marginal distribution over observable variables (e.g., images), p(x), such as, autoregressive models (ARMs), flow-based models (flows, for short), Variational Auto-Encoders (VAEs), and hierarchical models like hierarchical VAEs and diffusion-based deep generative models (DDGMs). However, from the very beginning, we advocate for using deep generative modeling in the context of finding the joint distribution over observables and decision variables that is factorized as p(x, y) = p(y|x)p(x). After taking the logarithm of the joint we obtain two additive components: \(\ln p(\mathbf {x}, y) = \ln p(y | \mathbf {x}) + \ln p(\mathbf {x})\). We outlined how such a joint model could be formulated and trained in the hybrid modeling setting (see Chap. 5). The drawback of hybrid modeling though is the necessity of weighting both distributions, i.e., \(\ell (\mathbf {x}, y \lambda ) = \ln p(y | \mathbf {x}) + \lambda \ln p(\mathbf {x})\), and for λ ≠ 1 this objective does not correspond to the log-likelihood of the joint distribution. The question is whether it is possible to formulate a model to learn with λ = 1. Here, we are going to discuss a potential solution to this problem using probabilistic energy-based models (EBMs).
Jakub M. Tomczak
Chapter 7. Generative Adversarial Networks
Abstract
Once we discussed latent variable models, we claimed that they naturally define a generative process by first sampling latents z ∼ p(z) and then generating observables x ∼ p θ(x|z). That is nice! However, the problem appears when we start thinking about training. To be more precise, the training objective is an issue. Why? Well, the probability theory tells us to get rid of all unobserved random variables by marginalizing them out.
Jakub M. Tomczak
Chapter 8. Deep Generative Modeling for Neural Compression
Abstract
In December 2020, Facebook reported having around 1.8 billion daily active users and around 2.8 billion monthly active users. Assuming that users uploaded, on average, a single photo each day, the resulting volume of data would give a very rough (let me stress it: a very rough) estimate of around 3000TB of new images per day. This single case of Facebook alone already shows us potential great costs associated with storing and transmitting data. In the digital era we can simply say this: efficient and effective manner of handling data (i.e., faster and smaller) means more money in the pocket.
Jakub M. Tomczak
Backmatter
Metadaten
Titel
Deep Generative Modeling
verfasst von
Assist. Prof. Jakub M. Tomczak
Copyright-Jahr
2022
Electronic ISBN
978-3-030-93158-2
Print ISBN
978-3-030-93157-5
DOI
https://doi.org/10.1007/978-3-030-93158-2

Premium Partner