license: openrail++
datasets:
- friedrichor/PhotoChat_120_square_HQ
language:
- en
tags:
- stable-diffusion
- text-to-image
This friedrichor/stable-diffusion-2-1-realistic
model fine-tuned from stable-diffusion-2-1 with friedrichor/PhotoChat_120_square_HQ
This model is not trained solely for Text-to-Image tasks, but as a part of the Tiger(currently not open-source and submission) model for Multimodal Dialogue Response Generation.
Model Details
- Model type: Diffusion-based text-to-image generation model
- Language(s): English
- License: CreativeML Open RAIL++-M License
- Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H).
Dataset
friedrichor/PhotoChat_120_square_HQ was used for fine-tuning Stable Diffusion v2.1.
120 image-text pairs
Images were manually screened from the PhotoChat dataset, cropped to square, and Gigapixel
was used to improve the quality.
Image captions are generated by BLIP-2.
How to fine-tuning
friedrichor/Text-to-Image-Summary/fine-tune/text2image
Simple use example
Using the 🤗's Diffusers library
import torch
from diffusers import StableDiffusionPipeline
device = "cuda:0"
pipe = StableDiffusionPipeline.from_pretrained("friedrichor/stable-diffusion-2-1-realistic", torch_dtype=torch.float32)
pipe.to(device)
prompt = "a woman in a red and gold costume with feathers on her head"
extra_prompt = ", facing the camera, photograph, highly detailed face, depth of field, moody light, style by Yasmin Albatoul, Harry Fayt, centered, extremely detailed, Nikon D850, award winning photography"
negative_prompt = "cartoon, anime, ugly, (aged, white beard, black skin, wrinkle:1.1), (bad proportions, unnatural feature, incongruous feature:1.4), (blurry, un-sharp, fuzzy, un-detailed skin:1.2), (facial contortion, poorly drawn face, deformed iris, deformed pupils:1.3), (mutated hands and fingers:1.5), disconnected hands, disconnected limbs"
generator = torch.Generator(device=device).manual_seed(42)
image = pipe(prompt + extra_prompt,
negative_prompt=negative_prompt,
height=768, width=768,
num_inference_steps=20,
guidance_scale=7.5,
generator=generator).images[0]
image.save("image.png")
Prompt template
Applying prompt templates is helpful for improving image quality
If you want to generate images with human in the real world, you can try the following prompt template.
{{caption}}, facing the camera, photograph, highly detailed face, depth of field, moody light, style by Yasmin Albatoul, Harry Fayt, centered, extremely detailed, Nikon D850, award winning photography
If you want to generate images in the real world without human, you can try the following prompt template.
{{caption}}, depth of field. bokeh. soft light. by Yasmin Albatoul, Harry Fayt. centered. extremely detailed. Nikon D850, (35mm|50mm|85mm). award winning photography.
For more prompt templates, see Dalabad/stable-diffusion-prompt-templates, r/StableDiffusion, etc.
Negative prompt
Applying negative prompt is also helpful for improving image quality
For example,
cartoon, anime, ugly, (aged, white beard, black skin, wrinkle:1.1), (bad proportions, unnatural feature, incongruous feature:1.4), (blurry, un-sharp, fuzzy, un-detailed skin:1.2), (facial contortion, poorly drawn face, deformed iris, deformed pupils:1.3), (mutated hands and fingers:1.5), disconnected hands, disconnected limbs
Hosted inference API
You can use the Hosted inference API on the right by inputting prompts.
For example,
a woman in a red and gold costume with feathers on her head, facing the camera, photograph, highly detailed face, depth of field, moody light, style by Yasmin Albatoul, Harry Fayt, centered, extremely detailed, Nikon D850, award winning photography