OneDiffusion / PROMPT_GUIDE.md
lehduong's picture
Upload folder using huggingface_hub
038856e verified
|
raw
history blame
4.84 kB

Prompt Guide

All examples are generated with a CFG of $4.2$, $50$ steps, and are non-cherrypicked unless otherwise stated. Negative prompt is set to:

monochrome, greyscale, low-res, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation

1. Text-to-Image

1.1 Long and detailed prompts give (much) better results.

Since our training comprised of long and detailed prompts, the model is more likely to generate better images with detailed prompts.

The model shows good text adherence with long and complex prompts as in below images. We use the first $20$ prompts from simoryu's examples. For detailed prompts, results of other models, refer to the above link.

Text-to-Image results

1.2 Resolution

The model generally works well with height and width in range of $[768; 1280]$ (height/width must be divisible by 16) for text-to-image. For other tasks, it performs best with resolution around $512$.

2. ID Customization & Subject-driven generation

  • The expected length of source captions is $30$ to $75$ words. Empirically, we find that longer prompt can help preserve the ID better but it might hinder the text-adherence for target caption.

  • We find it better to add some descriptions (e.g., from source caption) to target to preserve the identity, especially for complex subjects with delicate details.

ablation id task

3. Multiview generation

We recommend not use captions, which describe the facial features e.g., looking at the camera, etc, to mitigate multifaced/janus problems.

4. Image editing

We find it's generally better to set the guidance scale to lower value e.g., $[3; 3.5]$ to avoid over-saturation results.

5. Special tokens and available colors

5.1 Task Tokens

Task Token Additional Tokens
Text to Image [[text2image]]
Deblurring [[deblurring]]
Inpainting [[image_inpainting]]
Canny-edge and Image [[canny2image]]
Depth and Image [[depth2image]]
Hed and Image [[hed2img]]
Pose and Image [[pose2image]]
Image editing with Instruction [[image_editing]]
Semantic map and Image [[semanticmap2image]] <#00FFFF cyan mask: object/to/segment>
Boundingbox and Image [[boundingbox2image]] <#00FFFF cyan boundingbox: object/to/detect>
ID customization [[faceid]] [[img0]] target/caption [[img1]] caption/of/source/image_1 [[img2]] caption/of/source/image_2 [[img3]] caption/of/source/image_3
Multiview [[multiview]]
Subject-Driven [[subject_driven]] <item: name/of/subject> [[img0]] target/caption/goes/here [[img1]] insert/source/caption

Note that you can replace the cyan color above with any from below table and have multiple additional tokens to detect/segment multiple classes.

5.2 Available colors

Hex Code Color Name
#FF0000 red
#00FF00 lime
#0000FF blue
#FFFF00 yellow
#FF00FF magenta
#00FFFF cyan
#FFA500 orange
#800080 purple
#A52A2A brown
#008000 green
#FFC0CB pink
#008080 teal
#FF8C00 darkorange
#8A2BE2 blueviolet
#006400 darkgreen
#FF4500 orangered
#000080 navy
#FFD700 gold
#40E0D0 turquoise
#DA70D6 orchid