Score-based Generative Model through SDE

Score-based Generative Model through SDE

1. Notations

The meaning of $\nabla \cdot [\cdot]$ :

For scalar function $ϕ (x) : R^{d} \to R$ , we know $\nabla ϕ (x)$ is the gradient.

For vector function $ϕ (x) : R^{d} \to R^{d}$ , $\nabla \cdot ϕ (x) \in R$ is the divergence, defined as the sum of the partial derivatives of each component:

\nabla \cdot ϕ (x) = i = 1 \sum d \frac{\partial ϕ _{i}}{\partial x _{i}}

For matrix function $ϕ (x) : R^{d} \to R^{d \times d}$ , its divergence is a vector, defined as:

\nabla \cdot ϕ (x) = (j = 1 \sum d \frac{\partial ϕ _{1 j}}{\partial x _{j}}, \dots, j = 1 \sum d \frac{\partial ϕ _{d j}}{\partial x _{j}})^{⊤}

or written in a more general form:

\nabla \cdot ϕ (x) := (\nabla \cdot ϕ^{1} (x), \dots, \nabla \cdot ϕ^{d} (x))^{⊤}

where $ϕ^{i} (x)$ is the $i$ th row of $ϕ$ .

2. Score-based Diffusion through SDE (Unconditional)

SDE of forward process(noise):

d x = f (x, t) d t + g (x, t) d w f : R^{d} \times [0, T] \to R^{d}, g : R^{d} \times [0, T] \to R^{d \times d}

SDE of reverse process(denoise):

d x = (f (x, t) - \nabla \cdot [g (x, t) g (x, t)^{T}] - g (x, t) g (x, t)^{T} \nabla_{x} ln p_{t} (x)) d t + g (x, t) d w

probability flow ODE:

d x = (f (x, t) - \frac{1}{2} \nabla \cdot [g (x, t) g (x, t)^{T}] - \frac{1}{2} g (x, t) g (x, t)^{T} \nabla_{x} ln p_{t} (x)) d t

3. Training (Unconditional)

Sliced Score Matching:

θ^{*} = ar g θ min E_{t} {λ (t) E_{x (0)} E_{x (t)} E_{v \sim p_{v}} [v^{⊤} s_{θ} (x (t), t) v + \frac{1}{2} (v^{T} s_{θ} (x (t), t))^{2}]}

Denoising Score Matching: $θ^{*} = ar g θ min E_{t \sim U (0, T)} λ (t) E_{x_{0} \sim p_{0}} [E_{x \sim p_{0 t}} [∥ s_{θ} (x, t) - \nabla_{x} ln p_{0 t} (x ∣ x_{0}) ∥_{2}^{2}]]$

4. Score-based Diffusion through SDE (Conditional)

d x = (f (x, t) - \nabla \cdot [g (x, t) g (x, t)^{T}] - g (x, t) g (x, t)^{T} \nabla_{x} ln p_{t} (x ∣ y)) d t + g (x, t) d w

\nabla_{x} ln p_{t} (x ∣ y) = \nabla_{x} ln (\frac{p _{t} ( y ∣ x ) p _{t} ( x )}{p _{t} ( y )}) = \nabla_{x} ln p_{t} (y ∣ x) + \nabla_{x} ln p_{t} (x)

d x = (f (x, t) - \nabla \cdot [g (x, t) g (x, t)^{T}] - g (x, t) g (x, t)^{T} (\nabla_{x} ln p_{t} (y ∣ x) + \nabla_{x} ln p_{t} (x))) d t + g (x, t) d w

We can train a neural network $c (x, t)$ to learn $ln p_{t} (y ∣ x)$

We can also use some prior knowledge to directly determine $\nabla_{x} ln p_{t} (y ∣ x)$

5. Special Case

If $\exists \tilde{g} (t) \in R$ s.t. $g (x, t) = \tilde{g} (t) \cdot I$ , then the reverse process can be written as:

d x = (f (x, t) - \tilde{g} (t)^{2} \nabla_{x} ln p_{t} (x)) d t + \tilde{g} (t) d w

6. Denoising Score Matching

Assume forward diffusion process can be written as $x_{t} = a_{t} x_{0} + b_{t} ϵ, ϵ \sim N (0, I)$ , then

x_{t} \sim N (a_{t} x_{0}, b_{t}^{2} I)

minimize

t, x_{0}, x_{t} \sim p_{0 t} (x_{t} ∣ x_{0}) E ∥ s_{θ} (x_{t}, t) - \nabla_{x_{t}} ln p_{0 t} (x_{t} ∣ x_{0}) ∥_{2}^{2}

Equivalence of Epsilon Model and Score Model

score = \nabla_{x_{t}} ln p_{0 t} (x_{t} ∣ x_{0}) = - \frac{1}{b _{t}^{2}} (x_{t} - a_{t} x_{0}) = - \frac{ϵ}{b _{t}}

Unconditional Score Matching

$arg min ∥ \frac{- 1}{b _{t}} ϵ_{θ} (x_{t}, t) - \frac{- 1}{b _{t}} ϵ ∥_{2}^{2} ⟺ arg min ∥ ϵ_{θ} (x_{t}, t) - ϵ ∥_{2}^{2}$

Conditional (Loss Guidance) Score Matching

$arg min ∥ \frac{- 1}{b _{t}} ϵ_{θ} (x_{t}, t) - \frac{- 1}{b _{t}} ϵ - \nabla_{x} - l (x_{t}, x_{0}) ∥_{2}^{2} ⟺ arg min ∥ ϵ_{θ} (x_{t}, t) - ϵ - \nabla_{x} b_{t} l (x_{t}, x_{0}) ∥_{2}^{2}$

7. VPSDE（Continuous DDPM） (Unconditional)

P (x_{t} ∣ x_{t - 1}) = N (x_{t}; 1 - β_{t} \cdot x_{t - 1}, β_{t} \cdot I)

x_{t} = 1 - β_{t} \cdot x_{t - 1} + β_{t} \cdot ϵ, ϵ \sim N (0, I)

x_{t + Δ t} - x_{t} = 1 - β_{t + Δ t} \cdot x_{t} + β_{t + Δ t} \cdot ϵ - x_{t}

Because $1 - x = 1 - \frac{x}{2} + o (x)$ , we have:

x_{t + Δ t} - x_{t} = - \frac{β _{t + Δ t}}{2} \cdot x_{t} + β_{t + Δ t} \cdot ϵ + o (β_{t + Δ t}) \cdot x_{t}

SDE of forward process(noise):

d x = - \frac{β _{t}}{2} \cdot x d t + β_{t} d w

SDE of reverse process(denoise):

d x = (- \frac{β _{t}}{2} \cdot x - β_{t} \cdot \nabla_{x} ln p_{t} (x)) d t + β_{t} d w

参数对比

model	beta	n_step
DDPM	0.0001-0.02	1000
VPSDE	0.1-20	1000

8. VESDE

x_{t} \sim N (x_{0}, σ_{t}^{2} I), σ_{t} = σ_{min} (σ_{ma x} / σ_{min})^{t}

d x x_{t} Va r [x_{t}] g (t) = g (t) d w = x_{0} + \int_{0}^{t} g (s) d s = \int_{0}^{t} g (s)^{2} d s =: σ_{t}^{2} = \frac{d}{d t} σ_{t}^{2} = σ_{t} 2 ln (σ_{ma x} / σ_{min})

离散情况下：

x_{t} σ_{t}^{2} g (t) = x_{t - 1} + g (t) ϵ_{t}, ϵ_{t} \sim N (0, I) = σ_{t - 1}^{2} + g (t)^{2} = σ_{t}^{2} - σ_{t - 1}^{2}

Blogs

探索

SDE Diffusion