|
|
|
fine-tuned with text-image dataset `friedrichor/PhotoChat_120_square_HQ` |
|
|
|
# Model Details |
|
|
|
- Model type: Diffusion-based text-to-image generation model |
|
- Language(s): English |
|
- fine-tuning dataset: [friedrichor/PhotoChat_120_square_HQ](https://huggingface.co/datasets/friedrichor/PhotoChat_120_square_HQ) |
|
|
|
## Dataset |
|
[friedrichor/PhotoChat_120_square_HQ](https://huggingface.co/datasets/friedrichor/PhotoChat_120_square_HQ) was used for fine-tuning Stable Diffusion v2.1. |
|
|
|
120 image-text pairs |
|
|
|
Images were manually screened from the [PhotoChat](https://aclanthology.org/2021.acl-long.479/) dataset, cropped to square, and `Gigapixel` was used to improve the quality. |
|
Image captions are generated by [BLIP-2](https://arxiv.org/abs/2301.12597). |
|
|
|
## How to fine-tuning |
|
|
|
[friedrichor/Text-to-Image-Summary/fine-tune/text2image](https://github.com/friedrichor/Text-to-Image-Summary/tree/main/fine-tune/text2image) |
|
|
|
or [Hugging Face diffusers](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image) |
|
|
|
# Simple use example |
|
|
|
```python |
|
import torch |
|
from diffusers import StableDiffusionPipeline |
|
|
|
device = "cuda:0" |
|
pipe = StableDiffusionPipeline.from_pretrained("friedrichor/stable-diffusion-v2.1-portraiture", torch_dtype=torch.float32) |
|
pipe.to(device) |
|
|
|
prompt = "a woman in a red and gold costume with feathers on her head" |
|
extra_prompt = ", facing the camera, photograph, highly detailed face, depth of field, moody light, style by Yasmin Albatoul, Harry Fayt, centered, extremely detailed, Nikon D850, award winning photography" |
|
negative_prompt = "cartoon, anime, ugly, (aged, white beard, black skin, wrinkle:1.1), (bad proportions, unnatural feature, incongruous feature:1.4), (blurry, un-sharp, fuzzy, un-detailed skin:1.2), (facial contortion, poorly drawn face, deformed iris, deformed pupils:1.3), (mutated hands and fingers:1.5), disconnected hands, disconnected limbs" |
|
|
|
generator = torch.Generator(device=device).manual_seed(42) |
|
image = pipe(prompt + extra_prompt, |
|
negative_prompt=negative_prompt, |
|
height=768, width=768, |
|
num_inference_steps=20, |
|
guidance_scale=7.5, |
|
generator=generator).images[0] |
|
image.save("image.png") |
|
``` |
|
|
|
## Prompt template |
|
|
|
**Applying prompt templates is helpful for improving image quality** |
|
|
|
If you want to generate images with human in the real world, you can try the following prompt template. |
|
``` |
|
{{caption}}, facing the camera, photograph, highly detailed face, depth of field, moody light, style by Yasmin Albatoul, Harry Fayt, centered, extremely detailed, Nikon D850, award winning photography |
|
``` |
|
|
|
If you want to generate images in the real world without human, you can try the following prompt template. |
|
``` |
|
{{caption}}, depth of field. bokeh. soft light. by Yasmin Albatoul, Harry Fayt. centered. extremely detailed. Nikon D850, (35mm|50mm|85mm). award winning photography. |
|
``` |
|
|
|
For more prompt templates, see [Dalabad/stable-diffusion-prompt-templates](https://github.com/Dalabad/stable-diffusion-prompt-templates), [r/StableDiffusion](https://www.reddit.com/r/StableDiffusion/), etc. |
|
|
|
## Negative prompt |
|
|
|
**Applying negative prompt is also helpful for improving image quality** |
|
|
|
For example, |
|
``` |
|
cartoon, anime, ugly, (aged, white beard, black skin, wrinkle:1.1), (bad proportions, unnatural feature, incongruous feature:1.4), (blurry, un-sharp, fuzzy, un-detailed skin:1.2), (facial contortion, poorly drawn face, deformed iris, deformed pupils:1.3), (mutated hands and fingers:1.5), disconnected hands, disconnected limbs |
|
``` |
|
|
|
|