You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SD3.5 + FOCUS

SD3.5 fine-tuned for multi-subject prompts

TL;DR: A fine-tuned derivative of stabilityai/stable-diffusion-3.5-medium focused on multi-subject fidelity—keeping multiple entities and their attributes unentangled while preserving base style. Works across animals, people, and objects.
Read the paper: Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity.

⚠️ Licensing: This model inherits the StabilityAI Community License from the base model and is distributed under compatible terms. Use is subject to the base model’s license


What’s improved

  • Entity disentanglement: better separation across 2–4 subjects, fewer merges/omissions.
  • Attribute binding: colors, clothing, and small accessories stick to the correct subject.
  • Single Subject: also improve sinlge subject generation, while staying stylistic close to base model.

Quick start (Diffusers)

Install the 🧨 diffusers library

pip install -U transformers==4.53.0 diffusers==0.33.1

Then:

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "ericbill21/focus_sd35",
    torch_dtype=torch.float16
).to("cuda")
# For smaller GPUs use: pipe.enable_sequential_cpu_offload()

image = pipe(
    prompt="A horse and a bear in a forest",
    num_inference_steps=28,
    guidance_scale=4.5,
    max_sequence_length=77,
    height=512,
    width=512,
    generator=torch.Generator("cpu").manual_seed(1),
).images[0]

image.save("sample.png")

Since this uses the standard Diffusers pipeline, you can apply features like xFormers attention, VAE tiling/slicing, and quantization as usual.

How was this achieved?

We cast multi-subject fidelity as a stochastic optimal control problem over flow-matching samplers and fine-tune via FOCUS (an adjoint-matching heuristic). A lightweight controller is trained to respect subject identity, attributes, and spatial relations while staying close to the base distribution, yielding improved multi-subject fidelity without sacrificing style. Full details and ablations are in the paper and code.

Model details

  • Base: stabilityai/stable-diffusion-3.5-medium
  • Type: full pipeline (no LoRA required at inference)
  • Intended use: research/creative work where multi-subject consistency matters
  • Limitations: under extreme clutter or highly similar subjects, attributes may still leak; biases of the base model may persist.

Citation

If you find this useful, please cite:

@article{Bill2025FOCUS,
  title   = {Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity},
  author  = {Eric Tillmann Bill and Enis Simsar and Thomas Hofmann},
  journal = {arXiv preprint arXiv:2510.02315},
  year    = {2025},
  url     = {https://arxiv.org/abs/2510.02315}
}

Contact

Feedback and issues welcome via the Hugging Face model page or GitHub.

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ericbill21/focus_sd35

Finetuned
(28)
this model