|
--- |
|
license: openrail++ |
|
tags: |
|
- stable-diffusion |
|
- text-to-image |
|
- core-ml |
|
--- |
|
|
|
# Stable Diffusion v2-1-base Model Card |
|
|
|
This model was generated by Hugging Face using [Appleβs repository](https://github.com/apple/ml-stable-diffusion) which has [ASCL](https://github.com/apple/ml-stable-diffusion/blob/main/LICENSE.md). This version contains 2-bit linearly quantized Core ML weights for iOS 17 or macOS 14. To use weights without quantization, please visit [this model instead](https://huggingface.co/apple/coreml-stable-diffusion-2-1-base). |
|
|
|
This model card focuses on the model associated with the Stable Diffusion v2-1-base model. |
|
|
|
This `stable-diffusion-2-1-base` model fine-tunes [stable-diffusion-2-base](https://huggingface.co/stabilityai/stable-diffusion-2-base) (`512-base-ema.ckpt`) with 220k extra steps taken, with `punsafe=0.98` on the same dataset. |
|
|
|
These weights here have been converted to Core ML for use on Apple Silicon hardware. |
|
|
|
There are 4 variants of the Core ML weights: |
|
|
|
``` |
|
coreml-stable-diffusion-2-1-base |
|
βββ original |
|
β βββ compiled # Swift inference, "original" attention |
|
β βββ packages # Python inference, "original" attention |
|
βββ split_einsum |
|
βββ compiled # Swift inference, "split_einsum" attention |
|
βββ packages # Python inference, "split_einsum" attention |
|
``` |
|
|
|
There are also two zip archives suitable for use in the [Hugging Face demo app](https://github.com/huggingface/swift-coreml-diffusers) and other third party tools: |
|
|
|
- `coreml-stable-diffusion-2-1-base-palettized_original_compiled.zip` contains the compiled, 6-bit model with `ORIGINAL` attention implementation. |
|
- `coreml-stable-diffusion-2-1-base-palettized_split_einsum_v2_compiled.zip` contains the compiled, 6-bit model with `SPLIT_EINSUM_V2` attention implementation. |
|
|
|
Please, refer to https://huggingface.co/blog/diffusers-coreml for details. |
|
|
|
- Use it with 𧨠[`diffusers`](https://huggingface.co/stabilityai/stable-diffusion-2-1-base#examples) |
|
- Use it with the [`stablediffusion`](https://github.com/Stability-AI/stablediffusion) repository: download the `v2-1_512-ema-pruned.ckpt` [here](https://huggingface.co/stabilityai/stable-diffusion-2-1-base/resolve/main/v2-1_512-ema-pruned.ckpt). |
|
|
|
## Model Details |
|
- **Developed by:** Robin Rombach, Patrick Esser |
|
- **Model type:** Diffusion-based text-to-image generation model |
|
- **Language(s):** English |
|
- **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL) |
|
- **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses a fixed, pretrained text encoder ([OpenCLIP-ViT/H](https://github.com/mlfoundations/open_clip)). |
|
- **Resources for more information:** [GitHub Repository](https://github.com/Stability-AI/). |
|
- **Cite as:** |
|
|
|
@InProceedings{Rombach_2022_CVPR, |
|
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn}, |
|
title = {High-Resolution Image Synthesis With Latent Diffusion Models}, |
|
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, |
|
month = {June}, |
|
year = {2022}, |
|
pages = {10684-10695} |
|
} |
|
|
|
*This model was quantized by Vishnou Vinayagame and adapted from the original by Pedro Cuenca, itself adapted from Robin Rombach, Patrick Esser and David Ha |
|
*This model card was adapted by Pedro Cuenca from the original written by: Robin Rombach, Patrick Esser and David Ha and is based on the [Stable Diffusion v1](https://github.com/CompVis/stable-diffusion/blob/main/Stable_Diffusion_v1_Model_Card.md) and [DALL-E Mini model card](https://huggingface.co/dalle-mini/dalle-mini).* |
|
|