You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

SD3.5 fine-tuned for multi-subject prompts

TL;DR: A fine-tuned derivative of stabilityai/stable-diffusion-3.5-medium focused on multi-subject fidelity—keeping multiple entities and their attributes unentangled while preserving base style. Works across animals, people, and objects.
Read the paper: Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity.

⚠️ Licensing: This model inherits the StabilityAI Community License from the base model and is distributed under compatible terms. Use is subject to the base model’s license

What’s improved

Entity disentanglement: better separation across 2–4 subjects, fewer merges/omissions.
Attribute binding: colors, clothing, and small accessories stick to the correct subject.
Single Subject: also improve sinlge subject generation, while staying stylistic close to base model.

Quick start (Diffusers)

Install the 🧨 diffusers library

pip install -U transformers==4.53.0 diffusers==0.33.1

Then:

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "ericbill21/focus_sd35",
    torch_dtype=torch.float16
).to("cuda")
# For smaller GPUs use: pipe.enable_sequential_cpu_offload()

image = pipe(
    prompt="A horse and a bear in a forest",
    num_inference_steps=28,
    guidance_scale=4.5,
    max_sequence_length=77,
    height=512,
    width=512,
    generator=torch.Generator("cpu").manual_seed(1),
).images[0]

image.save("sample.png")

Since this uses the standard Diffusers pipeline, you can apply features like xFormers attention, VAE tiling/slicing, and quantization as usual.

How was this achieved?

We cast multi-subject fidelity as a stochastic optimal control problem over flow-matching samplers and fine-tune via FOCUS (an adjoint-matching heuristic). A lightweight controller is trained to respect subject identity, attributes, and spatial relations while staying close to the base distribution, yielding improved multi-subject fidelity without sacrificing style. Full details and ablations are in the paper and code.

Paper: https://arxiv.org/abs/2510.02315
Code: https://github.com/ericbill21/FOCUS

Model details

Base: stabilityai/stable-diffusion-3.5-medium
Type: full pipeline (no LoRA required at inference)
Intended use: research/creative work where multi-subject consistency matters
Limitations: under extreme clutter or highly similar subjects, attributes may still leak; biases of the base model may persist.

Citation

If you find this useful, please cite:

@article{Bill2025FOCUS,
  title   = {Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity},
  author  = {Eric Tillmann Bill and Enis Simsar and Thomas Hofmann},
  journal = {arXiv preprint arXiv:2510.02315},
  year    = {2025},
  url     = {https://arxiv.org/abs/2510.02315}
}

Contact

Feedback and issues welcome via the Hugging Face model page or GitHub.

Downloads last month: 7

Model tree for ericbill21/focus_sd35

Base model

stabilityai/stable-diffusion-3.5-medium

Finetuned

(28)

this model