AdamOswald1 commited on
Commit
0148e29
1 Parent(s): 7f31634

Upload 16 files

Browse files
.gitattributes CHANGED
@@ -32,3 +32,13 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
+ example-1.png filter=lfs diff=lfs merge=lfs -text
36
+ example-2.png filter=lfs diff=lfs merge=lfs -text
37
+ example-3.png filter=lfs diff=lfs merge=lfs -text
38
+ example-4.png filter=lfs diff=lfs merge=lfs -text
39
+ samples/1girl.png filter=lfs diff=lfs merge=lfs -text
40
+ samples/scenery.png filter=lfs diff=lfs merge=lfs -text
41
+ samples/1boy.png filter=lfs diff=lfs merge=lfs -text
42
+ 1girl.png filter=lfs diff=lfs merge=lfs -text
43
+ 1boy.png filter=lfs diff=lfs merge=lfs -text
44
+ scenery.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,254 @@
1
  ---
 
 
2
  license: creativeml-openrail-m
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: creativeml-openrail-m
5
+ pipeline_tag: text-to-image
6
+ tags:
7
+ - stable-diffusion
8
+ - stable-diffusion-diffusers
9
+ - text-to-image
10
+ - diffusers
11
+ inference: true
12
+ widget:
13
+ - text: >-
14
+ masterpiece, best quality, 1girl, brown hair, green eyes, colorful,
15
+ autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden
16
+ example_title: example 1girl
17
+ - text: >-
18
+ masterpiece, best quality, 1boy, medium hair, blonde hair, blue eyes, bishounen, colorful,
19
+ autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden
20
+ example_title: example 1boy
21
  ---
22
+
23
+ <font color="grey">Thanks to [Linaqruf](https://huggingface.co/Linaqruf) for letting me borrow his model card for reference.
24
+
25
+ # Anything V4
26
+
27
+ Welcome to Anything V4 - a latent diffusion model for weebs. The newest version of Anything. This model is intended to produce high-quality, highly detailed anime style with just a few prompts. Like other anime-style Stable Diffusion models, it also supports danbooru tags to generate images.
28
+
29
+ e.g. **_1girl, white hair, golden eyes, beautiful eyes, detail, flower meadow, cumulonimbus clouds, lighting, detailed sky, garden_**
30
+
31
+ I think the V4.5 version better though, it's in this repo. feel free 2 try it
32
+
33
+ # Gradio
34
+
35
+ We support a [Gradio](https://github.com/gradio-app/gradio) Web UI to run anything-v4.0:
36
+ [![Open In Spaces](https://camo.githubusercontent.com/00380c35e60d6b04be65d3d94a58332be5cc93779f630bcdfc18ab9a3a7d3388/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f25463025394625413425393725323048756767696e67253230466163652d5370616365732d626c7565)](https://huggingface.co/spaces/akhaliq/anything-v4.0)
37
+
38
+ ## 🧨 Diffusers
39
+
40
+ This model can be used just like any other Stable Diffusion model. For more information,
41
+ please have a look at the [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion).
42
+
43
+ You can also export the model to [ONNX](https://huggingface.co/docs/diffusers/optimization/onnx), [MPS](https://huggingface.co/docs/diffusers/optimization/mps) and/or [FLAX/JAX]().
44
+
45
+ ```python
46
+ from diffusers import StableDiffusionPipeline
47
+ import torch
48
+
49
+ model_id = "andite/anything-v4.0"
50
+ pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
51
+ pipe = pipe.to("cuda")
52
+
53
+ prompt = "hatsune_miku"
54
+ image = pipe(prompt).images[0]
55
+
56
+ image.save("./hatsune_miku.png")
57
+ ```
58
+
59
+ ## Examples
60
+
61
+ Below are some examples of images generated using this model:
62
+
63
+ **Anime Girl:**
64
+ ![Anime Girl](https://huggingface.co/andite/anything-v4.0/resolve/main/example-1.png)
65
+ ```
66
+ masterpiece, best quality, 1girl, white hair, medium hair, cat ears, closed eyes, looking at viewer, :3, cute, scarf, jacket, outdoors, streets
67
+ Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7
68
+ ```
69
+ **Anime Boy:**
70
+ ![Anime Boy](https://huggingface.co/andite/anything-v4.0/resolve/main/example-2.png)
71
+ ```
72
+ 1boy, bishounen, casual, indoors, sitting, coffee shop, bokeh
73
+ Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7
74
+ ```
75
+ **Scenery:**
76
+ ![Scenery](https://huggingface.co/andite/anything-v4.0/resolve/main/example-4.png)
77
+ ```
78
+ scenery, village, outdoors, sky, clouds
79
+ Steps: 50, Sampler: DPM++ 2S a Karras, CFG scale: 7
80
+ ```
81
+
82
+ ## License
83
+
84
+ This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage.
85
+ The CreativeML OpenRAIL License specifies:
86
+
87
+ 1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content
88
+ 2. The authors claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license
89
+ 3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully)
90
+ [Please read the full license here](https://huggingface.co/spaces/CompVis/stable-diffusion-license)
91
+
92
+ ## Big Thanks to
93
+
94
+ - [Linaqruf](https://huggingface.co/Linaqruf). [NoCrypt](https://huggingface.co/NoCrypt), and Fannovel16#9022 for helping me out alot regarding my inquiries and concern about models and other stuff.
95
+
96
+ # Anything V3 - Better VAE
97
+
98
+ Welcome to Anything V3 - Better VAE. It currently has three model formats: diffusers, ckpt, and safetensors. You'll never see a grey image result again. This model is designed to produce high-quality, highly detailed anime-style images with just a few prompts. Like other anime-style Stable Diffusion models, it also supports danbooru tags for image generation.
99
+ e.g. **_1girl, white hair, golden eyes, beautiful eyes, detail, flower meadow, cumulonimbus clouds, lighting, detailed sky, garden_**
100
+
101
+ ## Gradio
102
+
103
+ We support a [Gradio](https://github.com/gradio-app/gradio) Web UI to run Anything V3 with Better VAE:
104
+
105
+ [![Open In Spaces](https://camo.githubusercontent.com/00380c35e60d6b04be65d3d94a58332be5cc93779f630bcdfc18ab9a3a7d3388/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f25463025394625413425393725323048756767696e67253230466163652d5370616365732d626c7565)](https://huggingface.co/spaces/Linaqruf/Linaqruf-anything-v3-better-vae)
106
+
107
+ ## 🧨 Diffusers
108
+
109
+ This model can be used just like any other Stable Diffusion model. For more information,
110
+ please have a look at the [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion). You can also export the model to [ONNX](https://huggingface.co/docs/diffusers/optimization/onnx), [MPS](https://huggingface.co/docs/diffusers/optimization/mps) and/or [FLAX/JAX]().
111
+
112
+ You should install dependencies below in order to running the pipeline
113
+
114
+ ```bash
115
+ pip install diffusers transformers accelerate scipy safetensors
116
+ ```
117
+ Running the pipeline (if you don't swap the scheduler it will run with the default DDIM, in this example we are swapping it to DPMSolverMultistepScheduler):
118
+
119
+ ```python
120
+ from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
121
+
122
+ model_id = "Linaqruf/anything-v3-0-better-vae"
123
+
124
+ # Use the DPMSolverMultistepScheduler (DPM-Solver++) scheduler here instead
125
+ pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
126
+ pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
127
+ pipe = pipe.to("cuda")
128
+
129
+ prompt = "masterpiece, best quality, illustration, beautiful detailed, finely detailed, dramatic light, intricate details, 1girl, brown hair, green eyes, colorful, autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden"
130
+ negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name"
131
+
132
+ with autocast("cuda"):
133
+ image = pipe(prompt,
134
+ negative_prompt=negative_prompt,
135
+ width=512,
136
+ height=640,
137
+ guidance_scale=12,
138
+ num_inference_steps=50).images[0]
139
+
140
+ image.save("anime_girl.png")
141
+ ```
142
+
143
+ ## Examples
144
+
145
+ Below are some examples of images generated using this model:
146
+
147
+ **Anime Girl:**
148
+ ![Anime Girl](https://huggingface.co/Linaqruf/anything-v3-better-vae/resolve/main/samples/1girl.png)
149
+
150
+ **Anime Boy:**
151
+ ![Anime Boy](https://huggingface.co/Linaqruf/anything-v3-better-vae/resolve/main/samples/1boy.png)
152
+
153
+ **Scenery:**
154
+ ![Scenery](https://huggingface.co/Linaqruf/anything-v3-better-vae/resolve/main/samples/scenery.png)
155
+
156
+
157
+ ## License
158
+
159
+ This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage.
160
+ The CreativeML OpenRAIL License specifies:
161
+
162
+ 1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content
163
+ 2. The authors claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license
164
+ 3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully)
165
+ [Please read the full license here](https://huggingface.co/spaces/CompVis/stable-diffusion-license)
166
+
167
+ # Announcement
168
+
169
+ For (unofficial) continuation of this model, please visit [andite/anything-v4.0](https://huggingface.co/andite/anything-v4.0). I am aware that the repo exists because I am literally the one who (accidentally) gave the idea to publish his fine-tuned model ([andite/yohan-diffusion](https://huggingface.co/andite/yohan-diffusion)) as a base and merged it with many mysterious model, "hey, let's call it 'Anything V4.0'", because the quality is quite similar to Anything V3 but upgraded.
170
+
171
+ I also wanted to tell you something. I had a plan to remove/make private one of each repo named "Anything V3":
172
+ - [Linaqruf/anything-v3.0](https://huggingface.co/Linaqruf/anything-v3.0/)
173
+ - [Linaqruf/anything-v3-better-vae](https://huggingface.co/Linaqruf/anything-v3-better-vae)
174
+
175
+ Because there are two versions now and I'm late to realize this mysterious non-sense model is already polluted Huggingface Trending for so long, and now when the new repo comes out it is also there. I feel guilty everytime this model is in trending leaderboard.
176
+
177
+ I prefer to delete/make private this one and let us slowly move to [Linaqruf/anything-v3-better-vae](https://huggingface.co/Linaqruf/anything-v3-better-vae) with better repo management and a better VAE included in the model.
178
+
179
+ Please share your thoughts in this #133 discussion about whether I should delete this repo or another one, or maybe both of them.
180
+
181
+ Thanks,
182
+ Linaqruf.
183
+
184
+ ---
185
+
186
+ # Anything V3
187
+
188
+ Welcome to Anything V3 - a latent diffusion model for weebs. This model is intended to produce high-quality, highly detailed anime style with just a few prompts. Like other anime-style Stable Diffusion models, it also supports danbooru tags to generate images.
189
+
190
+ e.g. **_1girl, white hair, golden eyes, beautiful eyes, detail, flower meadow, cumulonimbus clouds, lighting, detailed sky, garden_**
191
+
192
+ ## Gradio
193
+
194
+ We support a [Gradio](https://github.com/gradio-app/gradio) Web UI to run Anything-V3.0:
195
+
196
+ [Open in Spaces](https://huggingface.co/spaces/akhaliq/anything-v3.0)
197
+
198
+
199
+
200
+ ## 🧨 Diffusers
201
+
202
+ This model can be used just like any other Stable Diffusion model. For more information,
203
+ please have a look at the [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion).
204
+
205
+ You can also export the model to [ONNX](https://huggingface.co/docs/diffusers/optimization/onnx), [MPS](https://huggingface.co/docs/diffusers/optimization/mps) and/or [FLAX/JAX]().
206
+
207
+ ```python
208
+ from diffusers import StableDiffusionPipeline
209
+ import torch
210
+
211
+ model_id = "Linaqruf/anything-v3.0"
212
+ branch_name= "diffusers"
213
+
214
+ pipe = StableDiffusionPipeline.from_pretrained(model_id, revision=branch_name, torch_dtype=torch.float16)
215
+ pipe = pipe.to("cuda")
216
+
217
+ prompt = "pikachu"
218
+ image = pipe(prompt).images[0]
219
+
220
+ image.save("./pikachu.png")
221
+ ```
222
+
223
+ ## Examples
224
+
225
+ Below are some examples of images generated using this model:
226
+
227
+ **Anime Girl:**
228
+ ![Anime Girl](https://huggingface.co/Linaqruf/anything-v3.0/resolve/main/1girl.png)
229
+ ```
230
+ 1girl, brown hair, green eyes, colorful, autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden
231
+ Steps: 50, Sampler: DDIM, CFG scale: 12
232
+ ```
233
+ **Anime Boy:**
234
+ ![Anime Boy](https://huggingface.co/Linaqruf/anything-v3.0/resolve/main/1boy.png)
235
+ ```
236
+ 1boy, medium hair, blonde hair, blue eyes, bishounen, colorful, autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden
237
+ Steps: 50, Sampler: DDIM, CFG scale: 12
238
+ ```
239
+ **Scenery:**
240
+ ![Scenery](https://huggingface.co/Linaqruf/anything-v3.0/resolve/main/scenery.png)
241
+ ```
242
+ scenery, shibuya tokyo, post-apocalypse, ruins, rust, sky, skyscraper, abandoned, blue sky, broken window, building, cloud, crane machine, outdoors, overgrown, pillar, sunset
243
+ Steps: 50, Sampler: DDIM, CFG scale: 12
244
+ ```
245
+
246
+ ## License
247
+
248
+ This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage.
249
+ The CreativeML OpenRAIL License specifies:
250
+
251
+ 1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content
252
+ 2. The authors claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license
253
+ 3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully)
254
+ [Please read the full license here](https://huggingface.co/spaces/CompVis/stable-diffusion-license)
feature_extractor/preprocessor_config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": {
3
+ "height": 224,
4
+ "width": 224
5
+ },
6
+ "do_center_crop": true,
7
+ "do_convert_rgb": true,
8
+ "do_normalize": true,
9
+ "do_rescale": true,
10
+ "do_resize": true,
11
+ "feature_extractor_type": "CLIPFeatureExtractor",
12
+ "image_mean": [
13
+ 0.48145466,
14
+ 0.4578275,
15
+ 0.40821073
16
+ ],
17
+ "image_processor_type": "CLIPFeatureExtractor",
18
+ "image_std": [
19
+ 0.26862954,
20
+ 0.26130258,
21
+ 0.27577711
22
+ ],
23
+ "resample": 3,
24
+ "rescale_factor": 0.00392156862745098,
25
+ "size": {
26
+ "shortest_edge": 224
27
+ }
28
+ }
model_index.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "StableDiffusionPipeline",
3
+ "_diffusers_version": "0.12.0.dev0",
4
+ "feature_extractor": [
5
+ "transformers",
6
+ "CLIPFeatureExtractor"
7
+ ],
8
+ "requires_safety_checker": true,
9
+ "safety_checker": [
10
+ "stable_diffusion",
11
+ "StableDiffusionSafetyChecker"
12
+ ],
13
+ "scheduler": [
14
+ "diffusers",
15
+ "PNDMScheduler"
16
+ ],
17
+ "text_encoder": [
18
+ "transformers",
19
+ "CLIPTextModel"
20
+ ],
21
+ "tokenizer": [
22
+ "transformers",
23
+ "CLIPTokenizer"
24
+ ],
25
+ "unet": [
26
+ "diffusers",
27
+ "UNet2DConditionModel"
28
+ ],
29
+ "vae": [
30
+ "diffusers",
31
+ "AutoencoderKL"
32
+ ]
33
+ }
safety_checker/config.json ADDED
@@ -0,0 +1,181 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_commit_hash": "cb41f3a270d63d454d385fc2e4f571c487c253c5",
3
+ "_name_or_path": "CompVis/stable-diffusion-safety-checker",
4
+ "architectures": [
5
+ "StableDiffusionSafetyChecker"
6
+ ],
7
+ "initializer_factor": 1.0,
8
+ "logit_scale_init_value": 2.6592,
9
+ "model_type": "clip",
10
+ "projection_dim": 768,
11
+ "text_config": {
12
+ "_name_or_path": "",
13
+ "add_cross_attention": false,
14
+ "architectures": null,
15
+ "attention_dropout": 0.0,
16
+ "bad_words_ids": null,
17
+ "begin_suppress_tokens": null,
18
+ "bos_token_id": 0,
19
+ "chunk_size_feed_forward": 0,
20
+ "cross_attention_hidden_size": null,
21
+ "decoder_start_token_id": null,
22
+ "diversity_penalty": 0.0,
23
+ "do_sample": false,
24
+ "dropout": 0.0,
25
+ "early_stopping": false,
26
+ "encoder_no_repeat_ngram_size": 0,
27
+ "eos_token_id": 2,
28
+ "exponential_decay_length_penalty": null,
29
+ "finetuning_task": null,
30
+ "forced_bos_token_id": null,
31
+ "forced_eos_token_id": null,
32
+ "hidden_act": "quick_gelu",
33
+ "hidden_size": 768,
34
+ "id2label": {
35
+ "0": "LABEL_0",
36
+ "1": "LABEL_1"
37
+ },
38
+ "initializer_factor": 1.0,
39
+ "initializer_range": 0.02,
40
+ "intermediate_size": 3072,
41
+ "is_decoder": false,
42
+ "is_encoder_decoder": false,
43
+ "label2id": {
44
+ "LABEL_0": 0,
45
+ "LABEL_1": 1
46
+ },
47
+ "layer_norm_eps": 1e-05,
48
+ "length_penalty": 1.0,
49
+ "max_length": 20,
50
+ "max_position_embeddings": 77,
51
+ "min_length": 0,
52
+ "model_type": "clip_text_model",
53
+ "no_repeat_ngram_size": 0,
54
+ "num_attention_heads": 12,
55
+ "num_beam_groups": 1,
56
+ "num_beams": 1,
57
+ "num_hidden_layers": 12,
58
+ "num_return_sequences": 1,
59
+ "output_attentions": false,
60
+ "output_hidden_states": false,
61
+ "output_scores": false,
62
+ "pad_token_id": 1,
63
+ "prefix": null,
64
+ "problem_type": null,
65
+ "projection_dim": 512,
66
+ "pruned_heads": {},
67
+ "remove_invalid_values": false,
68
+ "repetition_penalty": 1.0,
69
+ "return_dict": true,
70
+ "return_dict_in_generate": false,
71
+ "sep_token_id": null,
72
+ "suppress_tokens": null,
73
+ "task_specific_params": null,
74
+ "temperature": 1.0,
75
+ "tf_legacy_loss": false,
76
+ "tie_encoder_decoder": false,
77
+ "tie_word_embeddings": true,
78
+ "tokenizer_class": null,
79
+ "top_k": 50,
80
+ "top_p": 1.0,
81
+ "torch_dtype": null,
82
+ "torchscript": false,
83
+ "transformers_version": "4.26.0.dev0",
84
+ "typical_p": 1.0,
85
+ "use_bfloat16": false,
86
+ "vocab_size": 49408
87
+ },
88
+ "text_config_dict": {
89
+ "hidden_size": 768,
90
+ "intermediate_size": 3072,
91
+ "num_attention_heads": 12,
92
+ "num_hidden_layers": 12
93
+ },
94
+ "torch_dtype": "float32",
95
+ "transformers_version": null,
96
+ "vision_config": {
97
+ "_name_or_path": "",
98
+ "add_cross_attention": false,
99
+ "architectures": null,
100
+ "attention_dropout": 0.0,
101
+ "bad_words_ids": null,
102
+ "begin_suppress_tokens": null,
103
+ "bos_token_id": null,
104
+ "chunk_size_feed_forward": 0,
105
+ "cross_attention_hidden_size": null,
106
+ "decoder_start_token_id": null,
107
+ "diversity_penalty": 0.0,
108
+ "do_sample": false,
109
+ "dropout": 0.0,
110
+ "early_stopping": false,
111
+ "encoder_no_repeat_ngram_size": 0,
112
+ "eos_token_id": null,
113
+ "exponential_decay_length_penalty": null,
114
+ "finetuning_task": null,
115
+ "forced_bos_token_id": null,
116
+ "forced_eos_token_id": null,
117
+ "hidden_act": "quick_gelu",
118
+ "hidden_size": 1024,
119
+ "id2label": {
120
+ "0": "LABEL_0",
121
+ "1": "LABEL_1"
122
+ },
123
+ "image_size": 224,
124
+ "initializer_factor": 1.0,
125
+ "initializer_range": 0.02,
126
+ "intermediate_size": 4096,
127
+ "is_decoder": false,
128
+ "is_encoder_decoder": false,
129
+ "label2id": {
130
+ "LABEL_0": 0,
131
+ "LABEL_1": 1
132
+ },
133
+ "layer_norm_eps": 1e-05,
134
+ "length_penalty": 1.0,
135
+ "max_length": 20,
136
+ "min_length": 0,
137
+ "model_type": "clip_vision_model",
138
+ "no_repeat_ngram_size": 0,
139
+ "num_attention_heads": 16,
140
+ "num_beam_groups": 1,
141
+ "num_beams": 1,
142
+ "num_channels": 3,
143
+ "num_hidden_layers": 24,
144
+ "num_return_sequences": 1,
145
+ "output_attentions": false,
146
+ "output_hidden_states": false,
147
+ "output_scores": false,
148
+ "pad_token_id": null,
149
+ "patch_size": 14,
150
+ "prefix": null,
151
+ "problem_type": null,
152
+ "projection_dim": 512,
153
+ "pruned_heads": {},
154
+ "remove_invalid_values": false,
155
+ "repetition_penalty": 1.0,
156
+ "return_dict": true,
157
+ "return_dict_in_generate": false,
158
+ "sep_token_id": null,
159
+ "suppress_tokens": null,
160
+ "task_specific_params": null,
161
+ "temperature": 1.0,
162
+ "tf_legacy_loss": false,
163
+ "tie_encoder_decoder": false,
164
+ "tie_word_embeddings": true,
165
+ "tokenizer_class": null,
166
+ "top_k": 50,
167
+ "top_p": 1.0,
168
+ "torch_dtype": null,
169
+ "torchscript": false,
170
+ "transformers_version": "4.26.0.dev0",
171
+ "typical_p": 1.0,
172
+ "use_bfloat16": false
173
+ },
174
+ "vision_config_dict": {
175
+ "hidden_size": 1024,
176
+ "intermediate_size": 4096,
177
+ "num_attention_heads": 16,
178
+ "num_hidden_layers": 24,
179
+ "patch_size": 14
180
+ }
181
+ }
samples/1boy.png ADDED

Git LFS Details

  • SHA256: 977feb68eb35518fa11a4977d6f5658c724536da266da6ca2c998ed9bb5544d7
  • Pointer size: 132 Bytes
  • Size of remote file: 2.38 MB
samples/1girl.png ADDED

Git LFS Details

  • SHA256: ed516d2bde40f457bddbe9143d6c5175ed8e8954b3a254269835dd359f5322a1
  • Pointer size: 132 Bytes
  • Size of remote file: 2.53 MB
samples/scenery.png ADDED

Git LFS Details

  • SHA256: 02513474ee2b635407c1fa9192890d9049da9b6cb76a56cf765ff4cd5fa45d45
  • Pointer size: 132 Bytes
  • Size of remote file: 1.29 MB
scheduler/scheduler_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "PNDMScheduler",
3
+ "_diffusers_version": "0.12.0.dev0",
4
+ "beta_end": 0.012,
5
+ "beta_schedule": "scaled_linear",
6
+ "beta_start": 0.00085,
7
+ "clip_sample": false,
8
+ "num_train_timesteps": 1000,
9
+ "prediction_type": "epsilon",
10
+ "set_alpha_to_one": false,
11
+ "skip_prk_steps": true,
12
+ "steps_offset": 1,
13
+ "trained_betas": null
14
+ }
text_encoder/config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "openai/clip-vit-large-patch14",
3
+ "architectures": [
4
+ "CLIPTextModel"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 0,
8
+ "dropout": 0.0,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "quick_gelu",
11
+ "hidden_size": 768,
12
+ "initializer_factor": 1.0,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 77,
17
+ "model_type": "clip_text_model",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 12,
20
+ "pad_token_id": 1,
21
+ "projection_dim": 768,
22
+ "torch_dtype": "float32",
23
+ "transformers_version": "4.26.0.dev0",
24
+ "vocab_size": 49408
25
+ }
tokenizer/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|endoftext|>",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": {
4
+ "__type": "AddedToken",
5
+ "content": "<|startoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false
10
+ },
11
+ "do_lower_case": true,
12
+ "eos_token": {
13
+ "__type": "AddedToken",
14
+ "content": "<|endoftext|>",
15
+ "lstrip": false,
16
+ "normalized": true,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ },
20
+ "errors": "replace",
21
+ "model_max_length": 77,
22
+ "name_or_path": "runwayml/stable-diffusion-v1-5",
23
+ "pad_token": "<|endoftext|>",
24
+ "special_tokens_map_file": "./special_tokens_map.json",
25
+ "tokenizer_class": "CLIPTokenizer",
26
+ "unk_token": {
27
+ "__type": "AddedToken",
28
+ "content": "<|endoftext|>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ },
34
+ "errors": "replace",
35
+ "model_max_length": 77,
36
+ "name_or_path": "openai/clip-vit-large-patch14",
37
+ "pad_token": "<|endoftext|>",
38
+ "special_tokens_map_file": "./special_tokens_map.json",
39
+ "tokenizer_class": "CLIPTokenizer",
40
+ "unk_token": {
41
+ "__type": "AddedToken",
42
+ "content": "<|endoftext|>",
43
+ "lstrip": false,
44
+ "normalized": true,
45
+ "rstrip": false,
46
+ "single_word": false
47
+ }
48
+ }
tokenizer/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
unet/config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "UNet2DConditionModel",
3
+ "_diffusers_version": "0.12.0.dev0",
4
+ "act_fn": "silu",
5
+ "attention_head_dim": 8,
6
+ "block_out_channels": [
7
+ 320,
8
+ 640,
9
+ 1280,
10
+ 1280
11
+ ],
12
+ "center_input_sample": false,
13
+ "class_embed_type": null,
14
+ "cross_attention_dim": 768,
15
+ "down_block_types": [
16
+ "CrossAttnDownBlock2D",
17
+ "CrossAttnDownBlock2D",
18
+ "CrossAttnDownBlock2D",
19
+ "DownBlock2D"
20
+ ],
21
+ "downsample_padding": 1,
22
+ "dual_cross_attention": false,
23
+ "flip_sin_to_cos": true,
24
+ "freq_shift": 0,
25
+ "in_channels": 4,
26
+ "layers_per_block": 2,
27
+ "mid_block_scale_factor": 1,
28
+ "mid_block_type": "UNetMidBlock2DCrossAttn",
29
+ "norm_eps": 1e-05,
30
+ "norm_num_groups": 32,
31
+ "num_class_embeds": null,
32
+ "only_cross_attention": false,
33
+ "out_channels": 4,
34
+ "resnet_time_scale_shift": "default",
35
+ "sample_size": 64,
36
+ "up_block_types": [
37
+ "UpBlock2D",
38
+ "CrossAttnUpBlock2D",
39
+ "CrossAttnUpBlock2D",
40
+ "CrossAttnUpBlock2D"
41
+ ],
42
+ "upcast_attention": false,
43
+ "use_linear_projection": false
44
+ }
vae/config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.12.0.dev0",
4
+ "act_fn": "silu",
5
+ "block_out_channels": [
6
+ 128,
7
+ 256,
8
+ 512,
9
+ 512
10
+ ],
11
+ "down_block_types": [
12
+ "DownEncoderBlock2D",
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D"
16
+ ],
17
+ "in_channels": 3,
18
+ "latent_channels": 4,
19
+ "layers_per_block": 2,
20
+ "norm_num_groups": 32,
21
+ "out_channels": 3,
22
+ "sample_size": 512,
23
+ "up_block_types": [
24
+ "UpDecoderBlock2D",
25
+ "UpDecoderBlock2D",
26
+ "UpDecoderBlock2D",
27
+ "UpDecoderBlock2D"
28
+ ]
29
+ }