File size: 4,317 Bytes
52b728e
a704c82
 
52b728e
a704c82
 
 
 
 
 
 
 
 
 
 
 
52b728e
 
a704c82
52b728e
a704c82
52b728e
a704c82
 
 
52b728e
 
a704c82
52b728e
a704c82
52b728e
a704c82
 
 
52b728e
a704c82
 
 
52b728e
a704c82
 
 
 
 
 
 
 
 
52b728e
a704c82
7b0d0e7
 
 
 
52b728e
a704c82
52b728e
a704c82
52b728e
a704c82
52b728e
a704c82
52b728e
a704c82
 
 
 
 
 
 
 
52b728e
a704c82
 
52b728e
a704c82
 
 
52b728e
a704c82
 
52b728e
a704c82
 
52b728e
a704c82
 
52b728e
a704c82
 
 
52b728e
a704c82
 
 
 
 
 
52b728e
a704c82
 
 
 
52b728e
a704c82
 
52b728e
a704c82
52b728e
a704c82
52b728e
a704c82
 
 
 
 
 
52b728e
a704c82
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
language:
- en
library_name: diffusers
inference: true
license: other
license_name: stabilityai-ai-community
license_link: LICENSE.md
tags:
  - text-to-image
  - stable-diffusion
  - diffusers
base_model:
  - stabilityai/stable-diffusion-3.5-large
  - stabilityai/stable-diffusion-3.5-large-turbo
base_model_relation: merge
---

# Stable Diffusion 3.5 Merged

This repository contains the merged version of **Stable Diffusion 3.5**, combining the best features from both the [**Large**](https://huggingface.co/stabilityai/stable-diffusion-3.5-large) and [**Turbo**](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo) variants. 

| Large | Turbo | Merged |
| :--: | :--: | :--: |
| ![](./assets/large.png) | ![](./assets/turbo.png) | ![](./assets/sd-3.5-merged.png) |


## Inference

Run the following code to generate images using the merged model:

```python
from diffusers import StableDiffusion3Pipeline
import torch

pipeline = StableDiffusion3Pipeline.from_pretrained(
    "ariG23498/sd-3.5-merged", torch_dtype=torch.bfloat16
).to("cuda")

prompt = "a tiny astronaut hatching from an egg on the moon"
image = pipeline(
    prompt=prompt,
    guidance_scale=1.0,
    num_inference_steps=6,  # Run faster ⚡️
    generator=torch.manual_seed(0),
).images[0]
image.save("sd-3.5-merged.png")
```

> **Note**: Turbo variant runs faster with fewer steps, while Large variant requires more steps (around 50) but provides better detail.
With the merged model you would need to play with `num_inference_steps` and `guidance_scale` to get the perfect balance of speed and quality.
Below I show a grid of scale and step changes and its corresponding generations.

![](./assets/grid.png)

## Merging Models

This repository merges the **Stable Diffusion 3.5 Large** and **Stable Diffusion 3.5 Turbo** models into a single, powerful model. The Large version uses classifier-free guidance (CFG) and requires more steps, while the Turbo version is distilled for faster generation without CFG. 

The merged model retains the detail of the Large version and the speed of the Turbo version.

### Code to Merge Models

```python
from diffusers import SD3Transformer2DModel
from huggingface_hub import snapshot_download
from accelerate import init_empty_weights
from diffusers.models.model_loading_utils import load_model_dict_into_meta
import safetensors.torch
import glob
import torch

large_model_id = "stabilityai/stable-diffusion-3.5-large"
turbo_model_id = "stabilityai/stable-diffusion-3.5-large-turbo"

with init_empty_weights():
    config = SD3Transformer2DModel.load_config(large_model_id, subfolder="transformer")
    model = SD3Transformer2DModel.from_config(config)

large_ckpt = snapshot_download(repo_id=large_model_id, allow_patterns="transformer/*")
turbo_ckpt = snapshot_download(repo_id=turbo_model_id, allow_patterns="transformer/*")

large_shards = sorted(glob.glob(f"{large_ckpt}/transformer/*.safetensors"))
turbo_shards = sorted(glob.glob(f"{turbo_ckpt}/transformer/*.safetensors"))

merged_state_dict = {}
guidance_state_dict = {}

for i in range(len((large_shards))):
    state_dict_large_temp = safetensors.torch.load_file(large_shards[i])
    state_dict_turbo_temp = safetensors.torch.load_file(turbo_shards[i])

    keys = list(state_dict_large_temp.keys())
    for k in keys:
        if "guidance" not in k:
            merged_state_dict[k] = (state_dict_large_temp.pop(k) + state_dict_turbo_temp.pop(k)) / 2
        else:
            guidance_state_dict[k] = state_dict_large_temp.pop(k)

    if len(state_dict_large_temp) > 0:
        raise ValueError(f"There should not be any residue but got: {list(state_dict_large_temp.keys())}.")
    if len(state_dict_turbo_temp) > 0:
        raise ValueError(f"There should not be any residue but got: {list(state_dict_turbo_temp.keys())}.")

merged_state_dict.update(guidance_state_dict)
load_model_dict_into_meta(model, merged_state_dict)

model.to(torch.bfloat16).save_pretrained("transformer")

from huggingface_hub import upload_folder

upload_folder(
    repo_id="ariG23498/sd-3.5-merged",
    folder_path="transformer",
    path_in_repo="transformer",
)
```

This script downloads the checkpoints, merges them, and saves the merged model locally. You can then upload the merged model to Hugging Face Hub using `upload_folder`.