pcuenq
/

mdm-flickr-256

Model card Files Files and versions Community

mdm-flickr-256 / README.md

tolgacangoz's picture

Update the license to MIT

0ef80d3 verified about 2 months ago

|

1.76 kB

	---
	license: mit
	tags:
	- mdm
	---

	# Matryoshka Diffusion Models

	Matryoshka Diffusion Models was introduced in [the paper of the same name](https://huggingface.co/papers/2310.15111), by Jiatao Gu,Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly.

	This repository contains the Flickr 256 checkpoint.

	![Generation Examples from the MDM repository](samples.png)

	### Highlights

	* This checkpoint was trained on a dataset of 50M text-image pairs collected from Flickr.
	* This model was trained using nested UNets at various resolutions, and generates images with a resolution of 256 × 256.
	* Despite training on relatively small datasets, MDMs show strong zero-shot capabilities of generating high-resolution images and videos.

	## Checkpoints

	\| Model \| Dataset \| Resolution \| Nested UNets \|
	\|---------------------------------------------------------\|------------\|-------------\|--------------\|
	\| [mdm-flickr-64](https://hf.co/pcuenq/mdm-flickr-64) \| Flickr 50M \| 64 × 64 \| ❎ \|
	\| [mdm-flickr-256](https://hf.co/pcuenq/mdm-flickr-256) \| Flickr 50M \| 256 × 256 \| ✅ \|
	\| [mdm-flickr-1024](https://hf.co/pcuenq/mdm-flickr-1024) \| Flickr 50M \| 1024 × 1024 \| ✅ \|

	## How to Use

	Please, refer to the [original repository](https://github.com/apple/ml-mdm) for training and inference instructions.

	## Citation

	```
	@misc{gu2023matryoshkadiffusionmodels,
	title={Matryoshka Diffusion Models},
	author={Jiatao Gu and Shuangfei Zhai and Yizhe Zhang and Josh Susskind and Navdeep Jaitly},
	year={2023},
	eprint={2310.15111},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2310.15111},
	}
	```