Utilize HF's "balanced" device_map + dynamically pair diffusion components to relevant execution cores

#1
by diopside - opened

By utilizing balanced mode + explicitly pairing diffusion components on grouped GPUs, we avoid OOM and being able to run on 4*40Ls.
Distribution approach (i.e): Text encoder on GPU 1 - 16.6GB, Everything else on GPU 2 - 44.5GB including: Controlnet (4.23GB), VAE (254MB), Transformer (40GB).
This keeps the overall memory usage efficiently split across the GPUs while ensuring all components that need to interact directly are on the same device.

@instantx-admin @linoyts shabbat shalom, please review πŸ˜„

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment