Update README.md

7a66145 verified 8 months ago

3.93 kB

	---
	license: openrail++
	tags:
	- stable-diffusion
	- text-to-image
	- core-ml
	---

	# Stable Diffusion v2-1-base Model Card

	This model was generated by Hugging Face using [Apple’s repository](https://github.com/apple/ml-stable-diffusion) which has [ASCL](https://github.com/apple/ml-stable-diffusion/blob/main/LICENSE.md). This version contains 2-bit linearly quantized Core ML weights for iOS 17 or macOS 14. To use weights without quantization, please visit [this model instead](https://huggingface.co/apple/coreml-stable-diffusion-2-1-base).

	This model card focuses on the model associated with the Stable Diffusion v2-1-base model.

	This `stable-diffusion-2-1-base` model fine-tunes [stable-diffusion-2-base](https://huggingface.co/stabilityai/stable-diffusion-2-base) (`512-base-ema.ckpt`) with 220k extra steps taken, with `punsafe=0.98` on the same dataset.

	These weights here have been converted to Core ML for use on Apple Silicon hardware.

	There are 4 variants of the Core ML weights:

	```
	coreml-stable-diffusion-2-1-base
	├── original
	│ ├── compiled # Swift inference, "original" attention
	│ └── packages # Python inference, "original" attention
	└── split_einsum
	├── compiled # Swift inference, "split_einsum" attention
	└── packages # Python inference, "split_einsum" attention
	```

	There are also two zip archives suitable for use in the [Hugging Face demo app](https://github.com/huggingface/swift-coreml-diffusers) and other third party tools:

	- `coreml-stable-diffusion-2-1-base-palettized_original_compiled.zip` contains the compiled, 6-bit model with `ORIGINAL` attention implementation.
	- `coreml-stable-diffusion-2-1-base-palettized_split_einsum_v2_compiled.zip` contains the compiled, 6-bit model with `SPLIT_EINSUM_V2` attention implementation.

	Please, refer to https://huggingface.co/blog/diffusers-coreml for details.

	- Use it with 🧨 [`diffusers`](https://huggingface.co/stabilityai/stable-diffusion-2-1-base#examples)
	- Use it with the [`stablediffusion`](https://github.com/Stability-AI/stablediffusion) repository: download the `v2-1_512-ema-pruned.ckpt` [here](https://huggingface.co/stabilityai/stable-diffusion-2-1-base/resolve/main/v2-1_512-ema-pruned.ckpt).

	## Model Details
	- Developed by: Robin Rombach, Patrick Esser
	- Model type: Diffusion-based text-to-image generation model
	- Language(s): English
	- License: [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL)
	- Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses a fixed, pretrained text encoder ([OpenCLIP-ViT/H](https://github.com/mlfoundations/open_clip)).
	- Resources for more information: [GitHub Repository](https://github.com/Stability-AI/).
	- Cite as:

	@InProceedings{Rombach_2022_CVPR,
	author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
	title = {High-Resolution Image Synthesis With Latent Diffusion Models},
	booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	month = {June},
	year = {2022},
	pages = {10684-10695}
	}

	*This model was quantized by Vishnou Vinayagame and adapted from the original by Pedro Cuenca, itself adapted from Robin Rombach, Patrick Esser and David Ha
	This model card was adapted by Pedro Cuenca from the original written by: Robin Rombach, Patrick Esser and David Ha and is based on the [Stable Diffusion v1](https://github.com/CompVis/stable-diffusion/blob/main/Stable_Diffusion_v1_Model_Card.md) and [DALL-E Mini model card](https://huggingface.co/dalle-mini/dalle-mini).