--- license: apache-2.0 datasets: - lambdalabs/pokemon-blip-captions --- # Introduction This is the example model of [Distill SDXL](https://github.com/okotaku/diffengine/tree/main/configs/distill_sd). The training is based on [DiffEngine](https://github.com/okotaku/diffengine), the open-source toolbox for training state-of-the-art Diffusion Models with diffusers and mmengine. # Training ``` pip install openmim pip install git+https://github.com/okotaku/diffengine.git mim train diffengine tiny_sd_xl_pokemon_blip.py ``` More details to my blog post: # Dataset I used [lambdalabs/pokemon-blip-captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). # Inference ``` import torch from diffusers import DiffusionPipeline, UNet2DConditionModel, AutoencoderKL checkpoint = 'takuoko/tiny_sd_xl_pokemon_blip' prompt = 'a very cute looking pokemon with a hat on its head' unet = UNet2DConditionModel.from_pretrained( checkpoint, torch_dtype=torch.bfloat16 ) vae = AutoencoderKL.from_pretrained( 'madebyollin/sdxl-vae-fp16-fix', torch_dtype=torch.bfloat16, ) pipe = DiffusionPipeline.from_pretrained( 'stabilityai/stable-diffusion-xl-base-1.0', unet=unet, vae=vae, torch_dtype=torch.bfloat16 ) pipe.to('cuda') image = pipe( prompt, num_inference_steps=50, ).images[0] image.save('demo.png') ``` # Example result prompt = 'a very cute looking pokemon with a hat on its head' ![image](demo.png) # Reference Paper: [On Architectural Compression of Text-to-Image Diffusion Models](https://arxiv.org/abs/2305.15798) Unofficial implementation: https://github.com/segmind/distill-sd