fine-tuned with text-image dataset friedrichor/PhotoChat_120_square_HQ
Model Details
- Model type: Diffusion-based text-to-image generation model
- Language(s): English
- fine-tuning dataset: friedrichor/PhotoChat_120_square_HQ
Dataset
friedrichor/PhotoChat_120_square_HQ was used for fine-tuning Stable Diffusion v2.1.
120 image-text pairs
Images were manually screened from the PhotoChat dataset, cropped to square, and Gigapixel
was used to improve the quality.
Image captions are generated by BLIP-2.
How to fine-tuning
friedrichor/Text-to-Image-Summary/fine-tune/text2image
Simple use example
import torch
from diffusers import StableDiffusionPipeline
device = "cuda:0"
pipe = StableDiffusionPipeline.from_pretrained("friedrichor/stable-diffusion-v2.1-portraiture", torch_dtype=torch.float32)
pipe.to(device)
prompt = "a woman in a red and gold costume with feathers on her head"
extra_prompt = ", facing the camera, photograph, highly detailed face, depth of field, moody light, style by Yasmin Albatoul, Harry Fayt, centered, extremely detailed, Nikon D850, award winning photography"
negative_prompt = "cartoon, anime, ugly, (aged, white beard, black skin, wrinkle:1.1), (bad proportions, unnatural feature, incongruous feature:1.4), (blurry, un-sharp, fuzzy, un-detailed skin:1.2), (facial contortion, poorly drawn face, deformed iris, deformed pupils:1.3), (mutated hands and fingers:1.5), disconnected hands, disconnected limbs"
generator = torch.Generator(device=device).manual_seed(42)
image = pipe(prompt + extra_prompt,
negative_prompt=negative_prompt,
height=768, width=768,
num_inference_steps=20,
guidance_scale=7.5,
generator=generator).images[0]
image.save("image.png")
Prompt template
Applying prompt templates is helpful for improving image quality
If you want to generate images with human in the real world, you can try the following prompt template.
{{caption}}, facing the camera, photograph, highly detailed face, depth of field, moody light, style by Yasmin Albatoul, Harry Fayt, centered, extremely detailed, Nikon D850, award winning photography
If you want to generate images in the real world without human, you can try the following prompt template.
{{caption}}, depth of field. bokeh. soft light. by Yasmin Albatoul, Harry Fayt. centered. extremely detailed. Nikon D850, (35mm|50mm|85mm). award winning photography.
For more prompt templates, see Dalabad/stable-diffusion-prompt-templates, r/StableDiffusion, etc.
Negative prompt
Applying negative prompt is also helpful for improving image quality
For example,
cartoon, anime, ugly, (aged, white beard, black skin, wrinkle:1.1), (bad proportions, unnatural feature, incongruous feature:1.4), (blurry, un-sharp, fuzzy, un-detailed skin:1.2), (facial contortion, poorly drawn face, deformed iris, deformed pupils:1.3), (mutated hands and fingers:1.5), disconnected hands, disconnected limbs