Denoising Diffusion Probabilistic Models

1. Forward Diffusion Process

All vectors are column vectors. For multi-dimensional tensors, they can be flattened into column vectors.

In the forward process, we gradually transform a data distribution into a distribution which is close to .

Given noise schedule ,

or equivalently, let , we have

1.1. Reparameterization

By mathematical induction, we can prove that

That is, multiple noise additions can be expressed as one noise addition and it is easy to know that as the number of noise additions increases, the distribution of the data will be transformed into a standard normal distribution.

import torch
n_steps = 500
betas = torch.linspace(0.0001, 0.02, n_steps)
alphas = 1 - betas
alphas_cumprod = torch.cumprod(alphas, dim=0)
expectation = alphas_cumprod.sqrt()
variance = 1 - alphas_cumprod
expectation_rounded = [round(x, 3) for x in expectation[::10].numpy().tolist()]
variance_rounded = [round(x, 3) for x in variance[::10].numpy().tolist()]
print(f"expectation: {expectation_rounded}")
print(f"variance: {variance_rounded}")
expectation: [1.0, 0.998, 0.995, 0.989, 0.982, 0.972, 0.961, 0.948, 0.934, 0.917, 0.9, 0.88, 0.86, 0.838, 0.815, 0.791, 0.767, 0.742, 0.716, 0.689, 0.662, 0.635, 0.608, 0.581, 0.554, 0.527, 0.501, 0.474, 0.449, 0.423, 0.399, 0.375, 0.352, 0.329, 0.308, 0.287, 0.267, 0.248, 0.23, 0.213, 0.196, 0.181, 0.166, 0.153, 0.14, 0.128, 0.116, 0.106, 0.096, 0.087]
variance: [0.0, 0.003, 0.01, 0.021, 0.036, 0.054, 0.076, 0.101, 0.128, 0.159, 0.191, 0.225, 0.261, 0.298, 0.335, 0.374, 0.412, 0.45, 0.488, 0.525, 0.561, 0.596, 0.63, 0.662, 0.693, 0.722, 0.749, 0.775, 0.799, 0.821, 0.841, 0.859, 0.876, 0.891, 0.905, 0.918, 0.929, 0.938, 0.947, 0.955, 0.961, 0.967, 0.972, 0.977, 0.98, 0.984, 0.986, 0.989, 0.991, 0.992]

2. Training Process

Train a neural network to predict the noise in

To minimize:

3. Sampling Process

First, estimate an clean data

Then, we can use the following conditional distribution to sample (the data at the previous time step):

In practice, is usually set to .

Substitute into the formula above, we have

Therefore,

4. Useful Formulas

By mathematical induction, we can prove that

If , then

Then

In particular, for DDPM, we have and .