\[\newcommand{\norm}[1]{\left\lVert#1\right\rVert}\]

Part A: Fun with Duffision.

Part 1: Sampling Loops

1.1. Implementing the Forward Process

A key part of diffusion is the forward process (i.e. introducing noise to an image). For this part, we take the image of the Berkeley Campanile and incrementally add noise to it using:

\[x_t = \sqrt{\bar\alpha_t} x_0 + \sqrt{1 - \bar\alpha_t} \epsilon \quad \text{where}~ \epsilon \sim N(0, 1)\]

Here are the results at noise levels [0, 250, 500, 750]:

Original 250 500 750
dx dx dx dx

1.2 Classical Denoising.

For comparison to later methods, we’ll present the results of classical denoising using low-pass filter here:

250 500 750
dx dx dx
dx dx dx

1.3 One-Step Denoising.

For this part, we’ll use a pretrained U-Net to denoise our image. Notice the results are much better despite a bit of unwanted hallucination at higher noise levels:

250 500 750
dx dx dx
dx dx dx

1.4 Iterative Denoising

To achieve better result, we apply UNet denoising multiple times. Here’s the results at each timestep:

dx

For comparison, we include previous results here:

Gauss OneStep Iterative
dx dx dx

1.5 Diffusion Model Sample.

To sample from the diffusion, we iteratively denoise pure noise. Here are 5 samples:

1 2 3 4 5
dx dx dx dx dx

1.6 CFG.

To achieve better samples, we use a mix of conditional/unconditional sampling. Here are 5 samples with CFG scale of 7:

1 2 3 4 5
dx dx dx dx dx

1.7. Image2Image Translation.

i_start=1 i_start=3 i_start=5 i_start=7 i_start=10 i_start=20
dx dx dx dx dx dx

1.7.1 Editing Hand-Drawn and Web Images.

i_start=1 i_start=3 i_start=5 i_start=7 i_start=10 i_start=20
dx dx dx dx dx dx
dx dx dx dx dx dx
dx dx dx dx dx dx

1.7.2 Inpainting.

Original Mask Inpainting
dx dx dx
dx dx dx

1.7.3 Text-Conditioning.

i_start=1 i_start=3 i_start=5 i_start=7 i_start=10 i_start=20
dx dx dx dx dx dx
dx dx dx dx dx dx

1.8. Visual Anagrams

Old man Campfire
dx dx
Old man Barista
dx dx
Old man Dog
dx dx

1.9 Hybrid Images.

Hybrid of Skull and Waterfall
dx
Hybrid of Skull and Dog
dx
Hybrid of Skull and CampFire
dx

Part B: Fun with Duffision.

Part 1: Training Single-Step Denoising UNet.

1.2. Forward

First, we add implement a forward (adding noise) process. Here’s the result of adding increasing noise to the original image:

dx

1.2.1 Training.

Now, we train our UNet to denoise an image with noise level sigma=0.5 applied to it.

Here’s our loss curve:

dx

and here’s our results denoising at epoch 5:

dx

and at epoch 1:

dx

1.2.2 Out-of-Distribution Testing.

We’ll now try to apply our UNet on noise levels it wasn’t trained for. Here are the results for noise levels [0.0, 0.2, 0.4, 0.5, 0.6, 0.8, 1.0]:

dx

Notice the denoising quality gets progressively worse.

Part 2: Training a Diffusion Model.

2.1 Time-Conditioning.

We train our UNet now with time-conditioning. Here’s the loss curve:

dx

Here’s our results sampling from the time-conditioned UNet at certain epochs:

Epoch 5 Epoch 20
dx dx

2.4 Class-Conditioning

Now we train our UNet with class-conditioning (on top of time-conditioning). Here’s the loss curve:

dx

Here’s our results sampling from the class-conditioned UNet at certain epochs:

Epoch 5 Epoch 20
dx dx
dx dx
dx dx
dx dx
dx dx

Bells & Whistles

Here’s our gifs for time-conditioned UNet

Epoch 1 Epoch 5 Epoch 20
dx dx dx

and class-conditioned UNet

Epoch 1 Epoch 5 Epoch 20
dx dx dx