Skip to main content
Top

2024 | Book

Deep Generative Modeling

insite
SEARCH

About this book

This first comprehensive book on models behind Generative AI has been thoroughly revised to cover all major classes of deep generative models: mixture models, Probabilistic Circuits, Autoregressive Models, Flow-based Models, Latent Variable Models, GANs, Hybrid Models, Score-based Generative Models, Energy-based Models, and Large Language Models. In addition, Generative AI Systems are discussed, demonstrating how deep generative models can be used for neural compression, among others.

Deep Generative Modeling is designed to appeal to curious students, engineers, and researchers with a modest mathematical background in undergraduate calculus, linear algebra, probability theory, and the basics of machine learning, deep learning, and programming in Python and PyTorch (or other deep learning libraries). It should find interest among students and researchers from a variety of backgrounds, including computer science, engineering, data science, physics, and bioinformatics who wish to get familiar with deep generative modeling.
In order to engage with a reader, the book introduces fundamental concepts with specific examples and code snippets. The full code accompanying the book is available on the author's GitHub site: github.com/jmtomczak/intro_dgm

The ultimate aim of the book is to outline the most important techniques in deep generative modeling and, eventually, enable readers to formulate new models and implement them.

Table of Contents

Frontmatter
Chapter 1. Why Deep Generative Modeling?
Abstract
Before we start thinking about (deep) generative modeling, let us consider a simple example. Imagine we have trained a deep neural network that classifies images (\(\mathbf {x} \in \mathbb {Z}^{D}\)) of animals (\(y \in \mathcal {Y}\), and \(\mathcal {Y} = \{cat, dog, horse\}\)). Further, let us assume that this neural network is trained really well so that it always classifies a proper class with a high probability p(y|x). So far so good, right? The problem could occur though. As pointed out in [1], adding noise to images could result in completely false classification. An example of such a situation is presented in Fig. 1.1 where adding noise could shift predicted probabilities of labels; however, the image is barely changed (at least to us, human beings).
Jakub M. Tomczak
Chapter 2. Probabilistic Modeling: From Mixture Models to Probabilistic Circuits
Abstract
Let us imagine cats. Most people like cats, and some people are crazy in love with cats. There are ginger cats, black cats, big cats, small cats, puffy cats, and furless cats. In fact, there are many different kinds of cats. However, when I say this word: “a cat,” everyone has some kind of a cat in their mind. One can close eyes and generate a picture of a cat, either their own cat or a cat of a neighbor. Further, this generated cat is located somewhere, e.g., sleeping on a couch or in a garden chasing a fly, during the night or during the day, and so on. Probably, we can agree at this point that there are infinitely many possible scenarios of cats in some environments.
Jakub M. Tomczak
Chapter 3. Autoregressive Models
Abstract
Before we start discussing how we can model the distribution p(x), we refresh our memory about the core rules of probability theory, namely, the sum rule and the product rule. Let us introduce two random variables x and y.
Jakub M. Tomczak
Chapter 4. Flow-Based Models
Abstract
So far, we have discussed a class of deep generative models that model the distribution p(x) directly in an autoregressive manner. The main advantage of ARMs is that they can learn long-range statistics and, as a consequence, powerful density estimators. However, their drawback is that they are parameterized in an autoregressive manner; hence, sampling is rather a slow process. Moreover, they lack a latent representation; therefore, it is not obvious how to manipulate their internal data representation which makes it less appealing for tasks like compression or metric learning. In this chapter, we present a different approach to direct modeling of p(x). However, before we start our considerations, we will discuss a simple example.
Jakub M. Tomczak
Chapter 5. Latent Variable Models
Abstract
In the previous sections, we discussed two approaches to learning p(x): autoregressive models (ARMs) in Chap. 3 and flow-based models (or flows for short) in Chap. 4. Both ARMs and flows model the likelihood function directly, that is, either by factorizing the distribution and parameterizing conditional distributions p(xd|x<d) as in ARMs or by utilizing invertible transformations (neural networks) for the change of variables formula as in flows. Now, we will discuss a third approach that introduces latent variables.
Jakub M. Tomczak
Chapter 6. Hybrid Modeling
Abstract
In Chap. 1, I tried to convince you that learning the conditional distribution p(y|x) is not enough and, instead, we should focus on the joint distribution p(x, y).
Jakub M. Tomczak
Chapter 7. Energy-Based Models
Abstract
So far, we have discussed various deep generative models for modeling the marginal distribution over observable variables (e.g., images), p(x), such as autoregressive models (ARMs), flow-based models (flows, for short), variational autoencoders (VAEs), and hierarchical models like hierarchical VAEs and diffusion-based deep generative models (DDGMs). However, from the very beginning, we advocate for using deep generative modeling in the context of finding the joint distribution over observables and decision variables that is factorized as p(x, y) = p(y|x)p(x). After taking the logarithm of the joint, we obtain two additive components: \(\ln p(\mathbf {x}, y) = \ln p(y | \mathbf {x}) + \ln p(\mathbf {x})\). We outlined how such a joint model could be formulated and trained in the hybrid modeling setting (see Chap. 6). The drawback of hybrid modeling though is the necessity of weighting both distributions, i.e., \(\ell (\mathbf {x}, y \lambda ) = \ln p(y | \mathbf {x}) + \lambda \ln p(\mathbf {x})\), and for λ ≠ 1, this objective does not correspond to the log-likelihood of the joint distribution. The question is whether it is possible to formulate a model to learn with λ = 1. Here, we are going to discuss a potential solution to this problem using probabilistic energy-based models (EBMs) (LeCun et al. (2006) Predict Struct Data 1).
Jakub M. Tomczak
Chapter 8. Generative Adversarial Networks
Abstract
Once we discussed latent variable models, we claimed that they naturally define a generative process by first sampling latents z ∼ p(z) and then generating observables x ∼ pθ(x|z). That is nice! However, the problem appears when we start thinking about training. To be more precise, the training objective is an issue. Why? Well, the probability theory tells us to get rid of all unobserved random variables by marginalizing them out.
Jakub M. Tomczak
Chapter 9. Score-Based Generative Models
Abstract
I must say that it is hard to come up with a shorter definition of concurrent generative modeling. Once we look at various classes of models, we immediately notice that this is exactly what we try to do: generate data from noise! Don’t believe me? Ok, we should have a look at how various classes of generative models work.
Jakub M. Tomczak
Chapter 10. Deep Generative Modeling for Neural Compression
Abstract
In December 2020, Facebook reported having around 1.8 billion daily active users and around 2.8 billion monthly active users (Facebook reports fourth quarter and full year 2020 results, 2020.). Assuming that users uploaded, on average, a single photo each day, the resulting volume of data would give a very rough (let me stress it, a very rough) estimate of around 3000 TB of new images per day. This single case of Facebook alone already shows us the potential great costs associated with storing and transmitting data. In the digital era, we can simply say this: efficient and effective manner of handling data (i.e., faster and smaller) means more money in the pocket.
Jakub M. Tomczak
Chapter 11. From Large Language Models to Generative AI Systems
Abstract
How is it possible, my curious reader, that we can share our thoughts? How can it be that we discuss generative modeling, probability theory, or other interesting concepts? How come? The answer is simple: language. We communicate because the human species developed a pretty distinctive trait that allows us to formulate sounds in a very complex manner to express our ideas and experiences. At some point in our history, some people realized that we forget, we lie, and we can shout as strongly as we can, but we will not be understood farther than a few hundred meters. The solution was a huge breakthrough: writing. This whole mumbling on my side here could be summarized using one word: text. We know how to write (and read), and we can use the word text to mean language or natural language to avoid any confusion with artificial languages like Python or formal language.
Jakub M. Tomczak
Backmatter
Metadata
Title
Deep Generative Modeling
Author
Jakub M. Tomczak
Copyright Year
2024
Electronic ISBN
978-3-031-64087-2
Print ISBN
978-3-031-64086-5
DOI
https://doi.org/10.1007/978-3-031-64087-2

Premium Partner