File size: 7,140 Bytes
aa9b327 eaf367a 905f423 07d3566 eaf367a d8f4457 aa9b327 fc78795 75d30a7 6a8fdba c234f4a aa9b327 e36b0e9 aa9b327 882b6cd eaf367a 38f3abf f606969 38f3abf 882b6cd 38f3abf 6ace42a 38f3abf aa9b327 fc78795 aa9b327 38f3abf 6a8fdba 257fe1e 6a8fdba d19ac48 6a8fdba 49a23c0 6a8fdba 58ea16c 6820509 4b0c1d7 6820509 75d30a7 4898144 75d30a7 e36b0e9 75d30a7 4b0c1d7 75d30a7 4898144 110f60a 75d30a7 4898144 4b0c1d7 75d30a7 4898144 75d30a7 27fb7b3 9fda058 27fb7b3 aa9b327 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
license: openrail++
language:
- en
- ja
- es
- zh
widget:
- text: a beautiful illustration of a fantasy forest
example_title: Fantasy Forest
- text: environment concept art
example_title: Concept Art 1
tags:
- stable-diffusion
- sygil-diffusion
- text-to-image
- sygil-devs
- finetune
- stable-diffusion-1.5
inference: true
pinned: true
metrics:
- accuracy
- bertscore
- bleu
- bleurt
- brier_score
- cer
- character
- charcut_mt
- chrf
- code_eval
---
# About the model
-----------------
This model is a fine-tune of Stable Diffusion, trained on the [Imaginary Network Expanded Dataset](https://github.com/Sygil-Dev/INE-dataset), with the big advantage of allowing the use of multiple namespaces (labeled tags) to control various parts of the final generation.
While current models usually are prone to “context errors” and need substantial negative prompting to set them on the right track, the use of namespaces in this model (eg. “species:seal” or “studio:dc”) stop the model from misinterpreting a seal as the singer Seal, or DC Comics as Washington DC.
This model is also able to understand other languages besides English, currently it can partially understand prompts in Chinese, Japanese and Spanish. More training is already being done in order to have the model completely understand those languages and have it work just like how it works with English prompts.
As the model is fine-tuned on a wide variety of content, it’s able to generate many types of images and compositions, and easily outperforms the original model when it comes to portraits, architecture, reflections, fantasy, concept art, anime, landscapes and a lot more without being hyper-specialized like other community fine-tunes that are currently available.
**Note: The prompt engineering techniques needed are slightly different from other fine-tunes and the original Stable Diffusion model, so while you can still use your favorite prompts, for best results you might need to tweak them to make use of namespaces. A more detailed guide will be available later on, but you can use the tags and namespaces found here [Dataset Explorer](https://huggingface.co/spaces/Sygil/INE-dataset-explorer) should be able to start you off on the right track.
If you find our work useful, please consider supporting us on [OpenCollective](https://opencollective.com/sygil_dev)!
This model is still in its infancy and it's meant to be constantly updated and trained with more and more data as time goes by, so feel free to give us feedback on our [Discord Server](https://discord.gg/UjXFsf6mTu) or on the discussions section on huggingface. We plan to improve it with more, better tags in the future, so any help is always welcome 😛
[![Join the Discord Server](https://badgen.net/discord/members/fTtcufxyHQ?icon=discord)](https://discord.gg/UjXFsf6mTu)
# Showcase
![Showcase image](pictures/showcase-6.jpg)
## Examples
Using the [🤗's Diffusers library](https://github.com/huggingface/diffusers) to run Sygil Diffusion in a simple and efficient manner.
```bash
pip install diffusers transformers accelerate scipy safetensors
```
Running the pipeline (if you don't swap the scheduler it will run with the default DDIM, in this example we are swapping it to DPMSolverMultistepScheduler):
```python
import torch
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
model_id = "Sygil/Sygil-Diffusion"
# Use the DPMSolverMultistepScheduler (DPM-Solver++) scheduler here instead
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "a beautiful illustration of a fantasy forest"
image = pipe(prompt).images[0]
image.save("fantasy_forest_illustration.png")
```
**Notes**:
- Despite not being a dependency, we highly recommend you to install [xformers](https://github.com/facebookresearch/xformers) for memory efficient attention (better performance)
- If you have low GPU RAM available, make sure to add a `pipe.enable_attention_slicing()` after sending it to `cuda` for less VRAM usage (to the cost of speed).
## Available Checkpoints:
- #### Stable:
- [Sygil Diffusion v0.1](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.1.ckpt): Trained on Stable Diffusion 1.5 for 800,000 steps.
- [Sygil Diffusion v0.2](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.2.ckpt): Resumed from Sygil Diffusion v0.1 and trained for a total of 1.77 million steps.
- #### Beta:
- [sygil-diffusion-v0.3_1950288_lora.ckpt](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.3_1950288_lora.ckpt): Resumed from Sygil Diffusion v0.2 and trained for a total of 1.95 million steps so far.
Note: Checkpoints under the Beta section are updated daily or at least 3-4 times a week. This is usually the equivalent of 1-2 training session,
this is done until they are stable enough to be moved into a proper release, usually every 1 or 2 weeks.
While the beta checkpoints can be used as they are only the latest version is kept on the repo and the older checkpoints are removed when a new one
is uploaded to keep the repo clean. The HuggingFace inference API as well as the diffusers library will always use the latest beta checkpoint in the diffusers format.
For special cases we might make additional repositories to keep a copy of the diffusers model like when a model uses a different Stable Diffusion model as base (eg. Stable Diffusion 1.5 vs 2.1).
## Training
**Training Data**:
The model was trained on the following dataset:
- [Imaginary Network Expanded Dataset](https://github.com/Sygil-Dev/INE-dataset) dataset.
**Hardware and others**
- **Hardware:** 1 x Nvidia RTX 3050 8GB GPU
- **Hours Trained:** 743 hours approximately.
- **Optimizer:** AdamW
- **Adam Beta 1**: 0.9
- **Adam Beta 2**: 0.999
- **Adam Weight Decay**: 0.01
- **Adam Epsilon**: 1e-8
- **Gradient Checkpointing**: True
- **Gradient Accumulations**: 4
- **Batch:** 1
- **Learning rate:** warmup to 1e-7 for 10,000 steps and then kept constant
- **Lora unet Learning Rate**: 1e-7
- **Lora Text Encoder Learning Rate**: 1e-7
- **Resolution**: 512 pixels
- **Total Training Steps:** 1,950,288
Developed by: [ZeroCool94](https://github.com/ZeroCool940711) at [Sygil-Dev](https://github.com/Sygil-Dev/)
## Community Contributions:
- [Kevin Turner (keturn)](https://huggingface.co/keturn): created the [INE-dataset-explorer](https://huggingface.co/spaces/Sygil/INE-dataset-explorer) space for better browsing of the INE dataset.
*This model card is based on the [Stable Diffusion v1](https://github.com/CompVis/stable-diffusion/blob/main/Stable_Diffusion_v1_Model_Card.md) and [DALL-E Mini model card](https://huggingface.co/dalle-mini/dalle-mini).*
# License
This model is open access and available to all, with a CreativeML Open RAIL++-M License further specifying rights and usage. [Please read the full license here](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL) |