Official Model Card for (RPG) Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

The GitHub repo:https://github.com/YangLing0818/RPG-DiffusionMaster

RPG is a training-free paradigm that utilize MLLMs as prompt recaptioner and layout planner from diffusion models. This paradigm can be generalized to different diffusion models, here we provide some high-quality community models in CIVITAI based on stable-diffusion v1.4/1.5 and SDXL-1.0/SDXL-Turbo for high-quality generation. We will continue to update our model_pool combining with the ControlNet, stay tuned for our following progress.

Stable-diffusion v1.4/1.5 based model Details

For Stable-diffusion v1.4/1.5 based models, we use

AbsoluteReality for realistic style generation.

AnythingV3 for anime style generation.

Disney Pixar Cartoon for cartoon style generation.

SDXL v1.0/ SDXL-Turbo based models Details

For SDXL v1.0/ SDXL-Turbo based models, we use

AlbedoBaseXL for SDXL-baed photorealistic style generation.

DreamShaperXL for SDXL-Turbo based photorealistic style generation.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for BitStarWalkin/RPG_models

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Paper • 2401.11708 • Published Jan 22, 2024 • 30