File size: 1,762 Bytes
33631d3
0ef80d3
33631d3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
license: mit
tags:
- mdm
---

# Matryoshka Diffusion Models

Matryoshka Diffusion Models was introduced in [the paper of the same name](https://huggingface.co/papers/2310.15111), by Jiatao Gu,Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly.

This repository contains the **Flickr 256** checkpoint.

![Generation Examples from the MDM repository](samples.png)

### Highlights

* This checkpoint was trained on a dataset of 50M text-image pairs collected from Flickr.
* This model was trained using nested UNets at various resolutions, and generates images with a resolution of 256 × 256.
* Despite training on relatively small datasets, MDMs show strong zero-shot capabilities of generating high-resolution images and videos.

## Checkpoints

| Model                                                   | Dataset    | Resolution  | Nested UNets |
|---------------------------------------------------------|------------|-------------|--------------|
| [mdm-flickr-64](https://hf.co/pcuenq/mdm-flickr-64)     | Flickr 50M | 64 × 64     | ❎            |
| [mdm-flickr-256](https://hf.co/pcuenq/mdm-flickr-256)   | Flickr 50M | 256 × 256   | ✅            |
| [mdm-flickr-1024](https://hf.co/pcuenq/mdm-flickr-1024) | Flickr 50M | 1024 × 1024 | ✅            |

## How to Use

Please, refer to the [original repository](https://github.com/apple/ml-mdm) for training and inference instructions.

## Citation

```
@misc{gu2023matryoshkadiffusionmodels,
      title={Matryoshka Diffusion Models},
      author={Jiatao Gu and Shuangfei Zhai and Yizhe Zhang and Josh Susskind and Navdeep Jaitly},
      year={2023},
      eprint={2310.15111},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2310.15111},
}
```