SD3.5 fine-tuned for multi-subject prompts
TL;DR: A fine-tuned derivative of stabilityai/stable-diffusion-3.5-medium
focused on multi-subject fidelity—keeping multiple entities and their attributes unentangled while preserving base style. Works across animals, people, and objects.
Read the paper: Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity.
⚠️ Licensing: This model inherits the StabilityAI Community License from the base model and is distributed under compatible terms. Use is subject to the base model’s license
What’s improved
- Entity disentanglement: better separation across 2–4 subjects, fewer merges/omissions.
- Attribute binding: colors, clothing, and small accessories stick to the correct subject.
- Single Subject: also improve sinlge subject generation, while staying stylistic close to base model.
Quick start (Diffusers)
Install the 🧨 diffusers library
pip install -U transformers==4.53.0 diffusers==0.33.1
Then:
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained(
"ericbill21/focus_sd35",
torch_dtype=torch.float16
).to("cuda")
# For smaller GPUs use: pipe.enable_sequential_cpu_offload()
image = pipe(
prompt="A horse and a bear in a forest",
num_inference_steps=28,
guidance_scale=4.5,
max_sequence_length=77,
height=512,
width=512,
generator=torch.Generator("cpu").manual_seed(1),
).images[0]
image.save("sample.png")
Since this uses the standard Diffusers pipeline, you can apply features like xFormers attention, VAE tiling/slicing, and quantization as usual.
How was this achieved?
We cast multi-subject fidelity as a stochastic optimal control problem over flow-matching samplers and fine-tune via FOCUS (an adjoint-matching heuristic). A lightweight controller is trained to respect subject identity, attributes, and spatial relations while staying close to the base distribution, yielding improved multi-subject fidelity without sacrificing style. Full details and ablations are in the paper and code.
Model details
- Base:
stabilityai/stable-diffusion-3.5-medium
- Type: full pipeline (no LoRA required at inference)
- Intended use: research/creative work where multi-subject consistency matters
- Limitations: under extreme clutter or highly similar subjects, attributes may still leak; biases of the base model may persist.
Citation
If you find this useful, please cite:
@article{Bill2025FOCUS,
title = {Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity},
author = {Eric Tillmann Bill and Enis Simsar and Thomas Hofmann},
journal = {arXiv preprint arXiv:2510.02315},
year = {2025},
url = {https://arxiv.org/abs/2510.02315}
}
Contact
Feedback and issues welcome via the Hugging Face model page or GitHub.
- Downloads last month
- 7
Model tree for ericbill21/focus_sd35
Base model
stabilityai/stable-diffusion-3.5-medium