Coursework | Assignment: Denoising Diffusion on Two-Pixel Images

Denoising Diffusion on Two-Pixel Images Assignment Writing Service

The field of image synthesis has evolved significantly in recent years. From auto-regressive models and Variational Autoencoders (VAEs) to Generative Adversarial Networks (GANs), we have now entered a new era of diffusion models. A key advantage of diffusion models over other generative approaches is their ability to avoid mode collapse, allowing them to produce a diverse range of images. Given the high dimensionality of real images, it is impractical to sample and observe all possible modes directly. Our objective is to study denoising diffusion on two-pixel images to better understand how modes are generated and to visualize the dynamics and distribution within a 2D space. Assignment Writing Service

1 Introduction Assignment Writing Service

Diffusion models operate through a two-step process (Fig. 1): forward and reverse diffusion. Assignment Writing Service

Figure 1: Diffusion models have a forward process to successively add noise to a clear image x0 and a backward process to successively denoise an almost pure noise image xT [2]. Assignment Writing Service

During the forward diffusion process, noise εt is incrementally added to the original data at time step t, over more time steps degrading it to a point where it resembles pure Gaussian noise. Let εt represent standard Gaussian noise, we can parameterize the forward process as xt ∼ N (xt|√1 − βt xt−1, βt I): Assignment Writing Service

q(xt|xt−1) = p1 − βt xt−1 + pβt εt−1 (1) 0<βt <1. (2) Assignment Writing Service

Integrating all the steps together, we can model the forward process in a single step: Assignment Writing Service

√√
xt= α ̄txo+ 1−α ̄tε (3) Assignment Writing Service

αt =1−βt (4) α ̄ t = α 1 × α 2 × · · · × α t (5) Assignment Writing Service

As t → ∞, xt is equivalent to an isotropic Gaussian distribution. We schedule β1 < β2 < ... < βT , as larger update steps are more appropriate when the image contains significant noise. Assignment Writing Service

1 Assignment Writing Service

The reverse diffusion process, in contrast, involves the model learning to reconstruct the original data from a noisy version. This requires training a neural network to iteratively remove the noise, thereby recovering the original data. By mastering this denoising process, the model can generate new data samples that closely resemble the training data. Assignment Writing Service

We model each step of the reverse process as a Gaussian distribution
pθ(xt−1|xt) = N (xt−1|μθ(xt, t), Σθ(xt, t)) . (6) Assignment Writing Service

It is noteworthy that when conditioned on x0, the reverse conditional probability is tractable:
q(x |x,x )=Nx |μ,βˆI, (7) Assignment Writing Service

t−1 t 0 t−1 t t
where, using the Bayes’ rule and skipping many steps (See [8] for reader-friendly derivations), we have: Assignment Writing Service

1 1−αt
μt=√α xt−√1−α ̄εt . (8) Assignment Writing Service

tt Assignment Writing Service

We follow VAE[3] to optimize the negative log-likelihood with its variational lower bound with respect to μt and μθ(xt,t) (See [6] for derivations). We obtain the following objective function: Assignment Writing Service

L=Et∼[1,T],x0,ε∥εt −εθ(xt,t)∥2. (9) The diffusion model εθ actually predicts the noise added to x0 from xt at timestep t. Assignment Writing Service

a) many-pixel images b) two-pixel images Assignment Writing Service

Figure 2: The distribution of images becomes difficult to estimate and distorted to visualize for many- pixel images, but simple to collect and straightforward to visualize for two-pixel images. The former requires dimensionality reduction by embedding values of many pixels into, e.g., 3 dimensions, whereas the latter can be directly plotted in 2D, one dimension for each of the two pixels. Illustrated is a Gaussian mixture with two density peaks, at [-0.35, 0.65] and [0.75, -0.45] with sigma 0.1 and weights [0.35, 0.65] respectively. In our two-pixel world, about twice as many images have a lighter pixel on the right. Assignment Writing Service

In this homework, we study denoising diffusion on two-pixel images, where we can fully visualize the diffusion dynamics over learned image distributions in 2D (Fig. 2). Sec. 2 describes our model step by step, and the code you need to write to finish the model. Sec. 3 describes the starter code. Sec. 4 lists what results and answers you need to submit. Assignment Writing Service

2 Assignment Writing Service

2 Denoising Diffusion Probabilistic Models (DDPM) on 2-Pixel Images Assignment Writing Service

Diffusion models not only generate realistic images but also capture the underlying distribution of the training data. However, this probability density distributions (PDF) can be hard to collect for many- pixel images and their visualization highly distorted, but simple and direct for two-pixel images (Fig. 2). Consider an image with only two pixels, left and right pixels. Our two-pixel world contains two kinds of images: the left pixel lighter than the right pixel, or vice versa. The entire image distribution can be modeled by a Gaussian mixture with two peaks in 2D, each dimension corresponding to a pixel. Assignment Writing Service

Let us develop DDPM [2] for our special two-pixel image collection. Assignment Writing Service

2.1 Diffusion Step and Class Embedding Assignment Writing Service

We use a Gaussian Fourier feature embedding for diffusion step t:
xemb = sin2πw0x,cos2πw0x,...,sin2πwnx,cos2πwnx, wi ∼ N(0,1), i = 1,...,n. (10) Assignment Writing Service

For the class embedding, we simply need some linear layers to project the one-hot encoding of the class labels to a latent space. You do not need to do anything for this part. This part is provided in the code. Assignment Writing Service

2.2 Conditional UNet Assignment Writing Service

We use a UNet (Fig. 3) that takes as input both the time step t and the noised image xt, along with class label y if it is provided, and outputs the predicted noise. The network consists of only two blocks for the encoding or decoding pathway. To incorporate the step into the UNet features, we apply a dense Assignment Writing Service

Figure 3: Sampe condition UNet architecture. Please note how the diffusion step and the class/text conditional embeddings are fused with the conv blocks of the image feature maps. For simplicity, we will not add the attention module for our 2-pixel use case. Assignment Writing Service

3 Assignment Writing Service

linear layer to transform the step embedding to match the image feature dimension. A similar embedding approach can be used for class label conditioning. The detailed architecture is as follows. Assignment Writing Service

1. Encoding block 1: Conv1D with kernel size 2 + Dense + GroupNorm with 4 groups
2. Encoding block 2: Conv1D with kernel size 1 + Dense + GroupNorm with 32 groups
3. Decoding block 1: ConvTranspose1d with kernel size 1 + Dense + GroupNorm with 4 groups 4. Decoding block 2: ConvTranspose1d with kernel size 1 Assignment Writing Service

We use SiLU [1] as our activation function. When adding class conditioning, we handle it similarly to the diffusion step. Assignment Writing Service

Your to-do: Finish the model architecture and forward function in ddpm.py Assignment Writing Service

2.3 Beta Scheduling and Variance Estimation Assignment Writing Service

We adopt the sinusoidal beta scheduling [4] for better results then the original DDPM [2]:
α ̄t = f(t) (11) Assignment Writing Service

f (0)
t/T+s π2 Assignment Writing Service

f(t)=cos 1+s ·2 . (12) However, we follow the simpler posterior variance estimation [2] without using [4]’s learnt posterior Assignment Writing Service

variance method for estimating Σθ(xt,t).
For simplicity, we declare some global variables that can be handy during sampling and training. Here is Assignment Writing Service

the definition of these global variables in the code. Assignment Writing Service

1. betas: βt
2. alphas: αt = 1 − βt
3. alphas cumprod: α ̄t = Πt0αi ̃ 1−α ̄t−1
4. posterior variance: Σθ(xt, t) = σt = βt = 1−α ̄t βt Assignment Writing Service

Your to-do: Code up all these variables in utils.py. Feel free to add more variables you need. Assignment Writing Service

2.4 Training with and without Guidance Assignment Writing Service

For each DDPM iteration, we randomly select the diffusion step t and add random noise ε to the original image x0 using the β we defined for the forward process to get a noisy image xt. Then we pass the xt and t to our model to output estimated noise εθ, and calculate the loss between the actual noise ε and estimated noise εθ. We can choose different loss, from L1, L2, Huber, etc. Assignment Writing Service

To sample images, we simply follow the reverse process as described in [2]: Assignment Writing Service

11−αt
xt−1=√α xt−√1−α ̄εθ(xt,t) +σtz, wherez∼N(0,I)ift > 1else0. (13) Assignment Writing Service

tt Assignment Writing Service

We consider both classifier and classifier-free guidance. Classifier guidance requires training a separate classifier and use the classifier to provide the gradient to guide the generation of diffusion models. On the other hand, classifier-free guidance is much simpler in that it does not need to train a separate model. Assignment Writing Service

To sample from p(x|y), we need an estimation of ∇xt log p(xt|y). Using Bayes’ rule, we have:
∇xt log p(xt|y) = ∇xt log p(y|xt) + ∇xt log p(xt) − ∇xt log p(y) (14) Assignment Writing Service

= ∇xt log p(y|xt) + ∇xt log p(xt), (15) 4 Assignment Writing Service

Figure 4: Sample trajectories for the same start point (a 2-pixel image) with different guidance. Setting y = 0 generates a diffusion trajectory towards images of type 1 where the left pixel is darker than the right pixel, whereas setting y = 1 leads to a diffusion trajectory towards images of type 2 where the left pixel is lighter than the right pixel. Assignment Writing Service

where ∇xt logp(y|xt) is the classifier gradient and ∇xt logp(xt) the model likelihood (also called score function [7]). For classifier guidance, we could train a classifier fφ for different steps of noisy images and estimate p(y|xt) using fφ(y|xt). Assignment Writing Service

Classifier-free guidance in DDPM is a technique used to generate more controlled and realistic samples without the need for an explicit classifier. It enhances the flexibility and quality of the generated images by conditioning the diffusion model on auxiliary information, such as class labels, while allowing the model to work both conditionally and unconditionally. Assignment Writing Service

For classifier-free guidance, we make a small modification by parameterizing the model with an additional input y, resulting in εθ(xt,t,y). This allows the model to represent ∇xt logp(xt|y). For non-conditional generation, we simply set y = ∅. We have: Assignment Writing Service

∇xt log p(y|xt) = ∇xt log p(xt|y) − ∇xt log p(xt) (16) Recall the relationship between score functions and DDPM models, we have: Assignment Writing Service

ε ̄θ(xt, t, y) = εθ(xt, t, y) + w (εθ(xt, t, y) − εθ(xt, t, ∅)) (17) = (w + 1) · εθ(xt, t, y) − w · εθ(xt, t, ∅), (18) Assignment Writing Service

where w controls the strength of the conditional influence; w > 0 increases the strength of the guidance, pushing the generated samples more toward the desired class or conditional distribution. Assignment Writing Service

During training, we randomly drop the class label to train the unconditional model. We replace the orig- inal εθ(xt, t) with the new (w + 1)εθ(xt, t, y) − wεθ(xt, t, ∅) to sample with specific class labels (Fig.4). Classifier-free guidance involves generating a mix of the model’s predictions with and without condition- ing to produce samples with stronger or weaker guidance. Assignment Writing Service

Your to-do: Finish up all the training and sampling functions in utils.py for classifier-free guidance. 5 Assignment Writing Service

3 Starter Code Assignment Writing Service

gmm.py defines the Gaussian Mixture model for the groundtruth 2-pixel image distribution. Your training set will be sampled from this distribution. You can leave this file untouched. Assignment Writing Service
ddpm.py defines the model itself. You will need to follow the guideline to build your model there. Assignment Writing Service
utils.py defines all the other utility functions, including beta scheduling and training loop module. Assignment Writing Service
train.py defines the main loop for training. Assignment Writing Service

6 Assignment Writing Service

4 Problem Set Assignment Writing Service

(40 points) Finish the starter code following the above guidelines. Further changes are also welcome! Please make sure your training and visualization results are reproducible. In your report, state any changes that you make, any obstacles you encounter during coding and training, and a brief README about how to run your code. Assignment Writing Service
(20 points) Visualize a particular diffusion trajectory overlaid on the estimated image distribution PDF pθ(xt|t) at time-step t = 10,20,30,40,50, given max time-step T =50. We estimate the PDF by sampling a large number of starting points and see where they end up with at time t, using either 2D histogram binning or Gaussian kernel density estimation methods. Fig. 5 is an example of your result that the de-noising trajectory for a specific starting point overlaid on the groundtruth and estimated PDF. Assignment Writing Service

In short, visualize such a sample trajectory overlaid on 5 estimated PDF’s at t = 10,20,30,40,50 respectively and over the ground-truth PDF. Briefly describe what you observe. Assignment Writing Service

Figure 5: Sample de-noising trajectory overlaid on the estimated PDF for different steps. Assignment Writing Service
(20 points) Train multiple models with different maximum timesteps T = 5, 10, 25, 50. Sample and de- noise 5000 random noises. Visualize a plot with 4 × 2 subplots, with each row represents different T . The first column should be plots overlaying the scattered de-noised samples on the groundtruth PDF for different T , and the second column should be the estimated PDF from the de-noised samples. One row of sample can be found at Fig. 6. Describe what you observed in terms of the final distribution. Assignment Writing Service

Note that there are many existing ways [5, 9] to make smaller timesteps work well even for realistic images. Assignment Writing Service

Figure 6: Sample overlaid scatter with T = 25 Assignment Writing Service

4. (20 points) For guided generation, use the same starting noise with different label guidance (y = 0 vs. y = 1). Visualize the different trajectories from the same starting noise xT that lead to different modes overlaid on the same groundtruth PDF plot (similar to Fig. 4). Describe what you find. Assignment Writing Service

7 Assignment Writing Service

Figure 7: Sample MNIST images generated by denoising diffusion with classifier-free guidance. Assignment Writing Service

5. 30 points: Extend this model to MNIST images. Actions: Add more conv blocks for encoding/decoding; add residual layers and attention in each block; increase the max timestep to 200 or more. Four blocks for each pathway should be enough for MNIST. Generate 10 images for each digit and visualize all the generated images in a 10 × 10 grid (see Fig. 7). Observe and describe the diversity within each category. Visualize one trajectory of the generation from noise to a clear digit at t = 0, 25, 50, 75, 100, 125, 150, 175, 200. In your report, also answer the question: Throughout the generation, is this shape of the digit generated part by part, or all at once? Assignment Writing Service

8 Assignment Writing Service

5 Submission Instructions Assignment Writing Service

This assignment is to be completed individually. Assignment Writing Service
Please upload: Assignment Writing Service

(a) A PDF file of the graph and explanation: Write each problem on a different page.
(b) A folder containing all code files: Please leave all your visualization codes inside as well, so Assignment Writing Service

that we can reproduce your results if we find any graphs strange. Assignment Writing Service

(c) If you believe there may be an error in your code, please provide a written statement in the pdf describing what you think may be wrong and how it affected your results. If necessary, provide pseudocode and/or expected results for any functions you were unable to write. Assignment Writing Service
You may refactor the code as desired, including adding new files. However, if you make substantial changes, please leave detailed comments and reasonable file names. You are not required to create separate files for every model training/testing: commenting out parts of the code for different runs like in the scaffold is all right (just add some explanation). Assignment Writing Service

References Assignment Writing Service

[1] Stefan Elfwing, Eiji Uchibe, and Kenji Doya. “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning”. In: CoRR abs/1702.03118 (2017). arXiv: 1702. 03118. URL: http://arxiv.org/abs/1702.03118. Assignment Writing Service
[2] Jonathan Ho, Ajay Jain, and Pieter Abbeel. “Denoising Diffusion Probabilistic Models”. In: arXiv preprint arxiv:2006.11239 (2020). Assignment Writing Service
[3] Diederik P Kingma and Max Welling. Auto-Encoding Variational Bayes. 2022. arXiv: 1312.6114 [stat.ML]. URL: https://arxiv.org/abs/1312.6114. Assignment Writing Service
[4] Alex Nichol and Prafulla Dhariwal. “Improved Denoising Diffusion Probabilistic Models”. In: CoRR abs/2102.09672 (2021). arXiv: 2102.09672. URL: https://arxiv.org/abs/2102.09672. Assignment Writing Service
[5] Tim Salimans and Jonathan Ho. Progressive Distillation for Fast Sampling of Diffusion Models. 2022. arXiv: 2202.00512 [cs.LG]. URL: https://arxiv.org/abs/2202.00512. Assignment Writing Service
[6] Jascha Sohl-Dickstein et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. 2015. arXiv: 1503.03585 [cs.LG]. URL: https://arxiv.org/abs/1503.03585. Assignment Writing Service
[7] Yang Song and Stefano Ermon. “Generative Modeling by Estimating Gradients of the Data Distribu- tion”. In: CoRR abs/1907.05600 (2019). arXiv: 1907.05600. URL: http://arxiv.org/abs/1907. 05600. Assignment Writing Service
[8] Lilian Weng. “What are diffusion models?” In: lilianweng.github.io (July 2021). URL: https : / / lilianweng.github.io/posts/2021-07-11-diffusion-models/. Assignment Writing Service
[9] Qinsheng Zhang and Yongxin Chen. Fast Sampling of Diffusion Models with Exponential Integrator. 2023. arXiv: 2204.13902 [cs.LG]. URL: https://arxiv.org/abs/2204.13902. Assignment Writing Service

9 Assignment Writing Service