Work / train in progress

⚡️Waifu: efficient high-resolution waifu synthesis

waifu is a free text-to-image model that can efficiently generate images in 80 languages. Our goal is to create a small model without compromising on quality.

Core designs include:

(1) AuraDiffusion/16ch-vae: A fully open source 16ch VAE. Natively trained in fp16.
(2) Linear DiT: we use 1.6b DiT transformer with linear attention.
(3) MEXMA-SigLIP: MEXMA-SigLIP is a model that combines the MEXMA multilingual text encoder and an image encoder from the SigLIP model. This allows us to get a high-performance CLIP model for 80 languages..
(4) Other: we use Flow-Euler sampler, Adafactor-Fused optimizer and bf16 precision for training, and combine efficient caption labeling (MoonDream, CogVlM, Human, Gpt's) and danbooru tags to accelerate convergence.

Example

import torch
from diffusers import DiffusionPipeline

from transformers import XLMRobertaTokenizerFast,XLMRobertaModel
from diffusers import FlowMatchEulerDiscreteScheduler
from diffusers.models import AutoencoderKL
from diffusers import SanaTransformer2DModel

pipe_id = "AiArtLab/waifu-2b"
variant = "fp16"
# tokenizer
tokenizer = XLMRobertaTokenizerFast.from_pretrained(
    pipe_id,
    subfolder="tokenizer"
)

# text_encoder
text_encoder = XLMRobertaModel.from_pretrained(
    pipe_id,
    variant=variant,
    subfolder="text_encoder",
    add_pooling_layer=False
).to("cuda")

# scheduler
scheduler = FlowMatchEulerDiscreteScheduler(shift=1.0)

# VAE
vae = AutoencoderKL.from_pretrained(
    pipe_id,
    variant=variant,
    subfolder="vae"
).to("cuda")

# Transformer
transformer = SanaTransformer2DModel.from_pretrained(
    pipe_id,
    variant=variant,
    subfolder="transformer"
).to("cuda")

# Pipeline
pipeline = DiffusionPipeline.from_pretrained(
    pipe_id,
    tokenizer=tokenizer,
    text_encoder=text_encoder,
    vae=vae,
    transformer=transformer,
    trust_remote_code=True,
).to("cuda")
print(pipeline)

prompt = 'аниме девушка, waifu, يبتسم جنسيا , sur le fond de la tour Eiffel'
generator = torch.Generator(device="cuda").manual_seed(42)

image = pipeline(
    prompt = prompt,
    negative_prompt = "",
    generator=generator,
)[0]

for img in image:
    img.show()
    img.save('waifu.png')

Donations

We are a small GPU poor group of enthusiasts (current train budget ~$2k)

Please contact with us if you may provide some GPU's on training

DOGE: DEw2DR8C7BnF8GgcrfTzUjSnGkuMeJhg83

A fluffy domestic cat with piercing green eyes sits attentively in a sunlit room filled natural light streaming through large windows, its soft fur reflecting warm hues of orange from the golden glow casting across its sleek body and delicate features

Contacts

recoilme

How to cite

@misc{Waifu, 
    url    = {[https://huggingface.co/AiArtLab/waifu-2b](https://huggingface.co/AiArtLab/waifu-2b)}, 
    title  = {waifu-2b}, 
    author = {recoilme, muinez, femboysLover}
}