File size: 2,593 Bytes
2c13a98
e2d1a54
2c13a98
 
abb62fe
 
2c13a98
 
5efce4d
2c13a98
 
129be05
 
2c13a98
 
129be05
2c13a98
 
 
abb62fe
2c13a98
 
 
5c94019
2c13a98
 
 
129be05
 
 
abb62fe
5c94019
2c13a98
 
 
abb62fe
 
 
 
 
 
 
f44d178
abb62fe
 
 
 
 
 
2c13a98
3cb8f73
2c13a98
 
3cb8f73
2c13a98
f44d178
2c13a98
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
library_name: tf-keras
tags:
- generative
- denoising
- diffusion
- ddim
- ddpm
- unconditional-image-generation
---

This model was created for the [Keras code example](https://keras.io/examples/generative/ddim/) on [denoising diffusion implicit models (DDIM)](https://arxiv.org/abs/2010.02502).

## Model description

The model uses a [U-Net](https://arxiv.org/abs/1505.04597) with identical input and output dimensions. It progressively downsamples and upsamples its input image, adding skip connections between layers having the same resolution. The architecture is a simplified version of the architecture of [DDPM](https://arxiv.org/abs/2006.11239). It consists of convolutional residual blocks and lacks attention layers. The network takes two inputs, the noisy images and the variances of their noise components, which it encodes using [sinusoidal embeddings](https://arxiv.org/abs/1706.03762).

## Intended uses & limitations

The model is intended for educational purposes, as a simple example of denoising diffusion generative models. It has modest compute requirements with reasonable natural image generation performance.

## Training and evaluation data

The model is trained on the [Oxford Flowers 102](https://www.tensorflow.org/datasets/catalog/oxford_flowers102) dataset for generating images, which is a diverse natural dataset containing around 8,000 images of flowers. Since the official splits are imbalanced (most of the images are contained in the test splite), new random splits were created (80% train, 20% validation) for training the model. Center crops were used for preprocessing.

## Training procedure

The model is trained to denoise noisy images, and can generate images by iteratively denoising pure Gaussian noise. 

For more details check out the [Keras code example](https://keras.io/examples/generative/ddim/), or the companion [code repository](https://github.com/beresandras/clear-diffusion-keras), with additional features..

## Training hyperparameters

| Hyperparameters | Value |
| :-- | :-- |
| num epochs | 80 |
| dataset repetitions per epoch| 5 |
| image resolution | 64 |
| min signal rate | 0.02 |
| max signal rate | 0.95 |
| embedding dimensions | 32 |
| embedding max frequency | 1000.0 |
| block widths | 32, 64, 96, 128 |
| block depth | 2 |
| batch size | 64 |
| exponential moving average | 0.999 |
| optimizer | [AdamW](https://arxiv.org/abs/1711.05101) |
| learning rate | 1e-3 |
| weight decay | 1e-4 |

 ## Model plot

<details>
<summary>View model plot</summary>

![network architecture residual unet](./model.png)

</details>